Difference between RDD, DataFrame, Dataset in spark

What is the Difference between RDD, DataFrame, Dataset 


RDD :

RDD is a fault-tolerant collection of elements that can be operated on in parallel.

DataFrame :

DataFrame is a table kind of a table format with named columns. It's equivalent to a table in a relational database or a data frame in Python, but with richer optimizations under the hood using the spark engine.

Dataset :

Dataset is a distributed collection of data. Dataset has benefits of RDDs (strong typing, ability to use powerful lambda functions) with the benefits of Spark SQL’s optimized execution engine.

Comments