Sparksession vs hivecontext vs sqlcontext vs sparkcontext
Apache spark has been became more popular data processing framework because of its flexibility and speed.
During the process of evolution of spark with different versions there are significant changes.
We need a entry point for any of the framework which helps program to start, with respect to entry point execution spark has different features.
Till spark version <2.0 we had Sparkcontext, Hivecontext and SQLcontext
We will the differences between them:
Spark Context :
It is the entry point of the spark applications where the driver program starts, driver program is the process which divides the application logic into tasks and assigns it to each executer which is available on worker node.
Scala spark:
val sparkConf = new SparkConf().setAppName("my app").setMaster("local[1]")
val sparkContext = new SparkContext(sparkConf )
Pyspark:
sparkConf = SparkConf().setAppName("my app").setMaster("local[1]")
sparkContext = SparkContext(sparkConf)
SQLContext:
If we want to run the sql based queries on the dataframe instead of python or scala code use can use SQLcontext and write spark logic in sql creating temporary tables.
Scala spark:
val sparkConf = new SparkConf().setAppName("my app").setMaster("local[1]")
val sparkContext = new SparkContext(sparkConf )
val sql_context = SQlContext(sparkContext)
Pyspark:
sparkConf = SparkConf().setAppName("my app").setMaster("local[1]")
sparkContext = SparkContext(sparkConf)
sql_context = SQlContext(sparkContext)
Hive Context:
If we want to connect to hive and read data or write data to hive tables we can use HiveContext and perform the required operations on hive data.
Scala Spark:
val sparkConf = new SparkConf().setAppName("my app").setMaster("local[1]")
val sparkContext = new SparkContext(sparkConf )
val hive_context = HiveContext(sparkContext)
Pyspark:
sparkConf = SparkConf().setAppName("my app").setMaster("local[1]")
sparkContext = SparkContext(sparkConf)
hive_context = HiveContext(sparkContext)
SparkSession:
Later in the spark versions which are > 2.0 added new feature to have all SparkContext, SQLContext, HiveContext all in same entry point which is sparksession
Using sparksession we can write sql queries and connect to hive also.
Scala Spark:
val spark = SparkSession.builder.master("local").appName("My Application").getOrCreate()
Pyspark:
spark = SparkSession.builder \
.master("local") \
.appName("My Application") \
.getOrCreate()
Spark was made easy now using sparksession to connect to hive and use sql like code in spark application without using any extra configuration.
Comments
Post a Comment