In Spark both the Sort() and orderBy () methods are used for sorting the data. But the underlying functionality is different.
Let us discuss this with an example.
we create a dataframe in spark.
Sort():
let us use the sort function on the data frame.
I have sorted the datafraame on country and age columns.
The sort will order the data at the partition level and shows the result. so the order of the final df is not guaranteed.This is the most efficient method. if the order is not a criteria then we can go with this method.
OrderBy():
I have used orderBy on country and age.
On the other hand, order by will collect all the data to a single partition and order the data.
so, the order of data is guaranteed in the final data frame.
This is costly compared to the sort method.
Happy Learning :)
0 Comments