1. Suppose you have one dashboard everyday it is running XX:XX time and giving output at XX:XX time next day it is taking long time so, could you please tell me possible reason why it is taking too much long time to run in production?

2. What is KAFKA Partitioning?

3. What is lazy evaluation in spark?

4. Suppose you have 50GB of file, and you have inserted the data into HIVE db. so, what will be the size of the table? HIVE will be having just metadata of data; the actual file data will be in HDFS so what will be the actual size?

5. How you can debug the spark application if it’s got stuck somewhere? What are all ways you have used to debug?

6. Suppose you have one file in HDFS that is having 10 records. I want to read 5th record and edit and comeback. Means I want to edit the 5th record value how you will do that?

7. What will be your approach from the sparksubmit command, to speed up the performance?

8. Suppose you want to process one file till target table through spark, then what will be your approach to handle partitions?

9. Avoid shuffling in spark why? give proper example?

10. Suppose you have RDBMS and column-based database which is fast and why?