What is Spark, how to work with and setup spark.

Apache Spark

Apache Spark is a Bigdata processing engine where you can run all your analytical workloads(pipelines) with less time.

It helps to work using Structured, Semi Structured, un Structured data.

It has rich APL's to work with Machine Learning models and Graph processing.

We can use Java, Python, R, Scala frameworks to interact to spark engine.

Spark jobs can run SQL commands to run analytical queries on different tables.

Download spark:

Click here to download

Click on link here:

Local setup:

We need to have Java setup

Set below variables in Environment variable System Variables

JAVA_HOME : C:\Program Files\Java\jdk1.8.0_202

PATH : C:\Program Files\Java\jdk1.8.0_202

Spark Setup:

PATH : C:\Program Files\spark\bin

SPARK_HOME= C:\Program Files\spark

Scala spark: Open command prompt type spark-shell

PySpark : Open command prompt type pyspark

It will open spark shell.

Execution:

We can run Spark job locally with 1 thread or using N thread using

spark-shell --master local[N]

pyspark --master local[N]

World Tech Cure