1. Explain your current project?
  2. How to create triggers in aws lambda?
  3. Explain aws glue architecture?
  4. What is schema registry in aws Glue?
  5. How metata data is stored in glue catalog?
  6. Assume you are getting data from two different sources like DynamoDB and RDBMS, How Schema is managed in Glue Catalog?
  7. Design a solution using aws services, when a glue job is processed stakeholders should be notified with metrics like number of files processed, number of success, failures and size of each file?
  8. How to determine size of a file in aws S3 using aws glue and delete the files where size is greater than 100KB?
  9. Assume you are writing data to a sink, While writing data few corrupted records are missing. How to store corrupted records in a separete table using spark?
  10. What is the difference between spark RDD and Dataframe?
  11. What is repartition in Spark?
  12. What are spark deploy modes?
  13. What is the role of Map function in Spark. Explain with an example?
  14. Suppose you are reading a huge CSV file into a spark Dataframe, There will be shuffle in partitions. How to get data evenly in all partitions?
  15. How to perform union of two dataframes when the schema of dataframes is dynamically changes?
  16. How to flatten json in snowflake?
  17. Write a python program to capitalize all first letter of  each word in a string?
  18. Write a python program to get 3rd least value in a list?
  19. What is row number in sql?
  20. How the data is stored in a data warehouse ? is it normalized or de-normalized?