The following are the Important terminologies every person must know before appearing for the interviews.

  • RDD
  • Data Frame / Data Set
  • Spark config
  • Spark Context
  • Spark Session
  • Parallelized collections
  • Shared Variables
  • Accumulators
  • Broadcast Variables
  • Number of Partitions
  • Transformations
  • Actions
  • Cache
  • Persist
  • Task
  • Driver and Executors
  • Shuffle
  • Repartition
  • Coalesce
  • Reduce by key, Aggregate by key, and Group by key
  • Cogroup and join
  • Scalar functions
  • Aggregated functions
  • Temp view and Global Temp view
  • Bucketing, Sorting, and Partitioning