The following are the Important terminologies every person must know before appearing for the interviews.
- RDD
- Data Frame / Data Set
- Spark config
- Spark Context
- Spark Session
- Parallelized collections
- Shared Variables
- Accumulators
- Broadcast Variables
- Number of Partitions
- Transformations
- Actions
- Cache
- Persist
- Task
- Driver and Executors
- Shuffle
- Repartition
- Coalesce
- Reduce by key, Aggregate by key, and Group by key
- Cogroup and join
- Scalar functions
- Aggregated functions
- Temp view and Global Temp view
- Bucketing, Sorting, and Partitioning
0 Comments