SparkContext:

Main entry point for Spark functionality. A SparkContext represents the connection to a Spark cluster, and can be used to create RDDs, accumulators and broadcast variables on that cluster.

Syntax for SparkContext:

from pyspark import SparkContext

--Create a SparkContext object with local mode and specifying the number of worker threads

sc = SparkContext("local[*]", "example")

--Now you can use 'sc' to create RDDs and perform operations on them


SparkSession:
It’s a unified object to perform all the Spark operations. In the earlier version of the Spark 1.x there were separate objects like SparkContext, SQLContext, HiveContext, SparkConf, and StreamingContext. However with Spark 2.x all these different objects combine into one i.e. the SparkSession. You can perform all those operations using the SparkSession object itself.
This unison of all the objects has made life simpler for the Spark Developers.

Syntax for SparkSession

from pyspark.sql import SparkSession

spark = SparkSession \
.builder \
.appName("Name") \
.getOrCreate()

Why should use SparkSession over SparkContext?

  • from Spark 2.0, SparkSession provides a common entry point for a Spark application.
  • Instead of SparkContext, HiveContext, SQLContext, everything is now within a SparkSession.
  • It unifies all of sparks numerous contexts. before version 2.0 need to create separate context per JVM.
  • however with SparkSession this problem has been resolved.