Q. Which part of the code runs in Driver and which part runs on executor?

A: Answer

Q. How to Select the number of executors and the number of cores per executor? 

A: Answer

Q. How to find the number of partitions in a DataFrame?

A: Answer

Q.What is Difference between map and flat map?

A: Map and flat map are the transformation operations in spark.

Map:

·         The map applies to each element in an RDD and gives new resultant RDD.

·         It is one to one transformation.

·         Map transforms an RDD of length N and gives an RDD of length N

Ex: rdd.uppercase()

flat map:

·         It takes an RDD and returns 0,1, or more elements in RDD

·         It takes RDD of length N and returns RDD of Length more than N

·         It is one to many Transformation.

Ex: RDD.split(‘,’)

Q. What is the difference between wide and Narrow Transformation?

A: In Narrow Transformation only one partition data is required to calculate that partition.

consider a scenario where a table is divided into multiple partitions, we want to filter that table on a certain condition. In this case, only those partitions where the condition is satisfied are used to give the result. In Narrow transformation, shuffling won't happen. 

EX: filter(), map()

In Wide Transformation, we need to shuffle the data across the partitions.

EX: group by key(), aggregate by key()

Q.Difference between-group by key and reduce by key?

A:     Group by key does not use combiner and reduce by key uses combiner.

Reduce by key is more efficient and faster than group by key.

Q. Difference between Leanage and Dax?

A: Lineage is a Logical plan on how to create an RDD from another RDD.

DAG is a physical plan. It has more information like how stages are dependent on each other. 

Q. Partition vs Bucketing in Spark

A: Partition is done on a column. 

        number of partitions  = number of unique values present in that column

    Bucketing is done on a column based on the hash value. here we need to mention the number of buckets we want.

In spark, the partition needs to be done before bucketing. In Hive, we can directly create buckets.