What is Spark, how to work with and setup spark.

Apache Spark

Apache Spark is a Bigdata processing engine where you can run all your analytical workloads(pipelines) with less time.

It helps to work using Structured, Semi Structured, un Structured data.

It has rich APL's to work with Machine Learning models and Graph processing.

We can use Java, Python, R, Scala frameworks to interact to spark engine.

Spark jobs can run SQL commands to run analytical queries on different tables.


Download spark:

https://spark.apache.org/downloads.html

Click here to download




Click on link here:





Click on link here:




Local setup:

We need to have Java setup  

Set below variables in Environment variable System Variables

JAVA_HOME : C:\Program Files\Java\jdk1.8.0_202

PATH : C:\Program Files\Java\jdk1.8.0_202


Spark Setup:

PATH : C:\Program Files\spark\bin

SPARK_HOME= C:\Program Files\spark


Scala spark: Open command prompt type spark-shell

PySpark : Open command prompt type pyspark

It will open spark shell.


Execution:

We can run Spark job locally with 1 thread or using N thread using 

spark-shell --master local[N]

pyspark --master local[N]



Comments