Table joins in spark

How to work with joins in spark.

Below are the different joins.

Inner join

left join

Right join

Full outer join


Inputs 

Table : marks          

 id name    marks

  1  Shiva    90

  2  Ram     85

  3  Mohan  95

  4  Raju     96

Table: location

id   location

1    Hyderabad

2    Chennai

5     Banglore


marks = spark.read.option("delimiter",",").option("header","true").option("inferSchema","true").csv("C:\\demo\\files\\marks_data.csv")


location = spark.read.option("delimiter",",").option("header","true").option("inferSchema","true").csv("C:\\demo\\files\\location.csv")


Inner Join:

joined_df  =  marks.join(location,location["id"] == marks["id"],"inner")
joined_df.show()




Left join:

joined_df  =  marks.join(location,location["id"] == marks["id"],"left")
joined_df.show()





Right Join:

joined_df  =  marks.join(location,location["id"] == marks["id"],"right")
joined_df.show()







Full Outer Join:

joined_df  =  marks.join(location,location["id"] == marks["id"],"full_outer")
joined_df.show()

















Comments