Apache Spark
Apache Spark is a Bigdata processing engine where you can run all your analytical workloads(pipelines) with less time.
It helps to work using Structured, Semi Structured, un Structured data.
It has rich APL's to work with Machine Learning models and Graph processing.
We can use Java, Python, R, Scala frameworks to interact to spark engine.
Spark jobs can run SQL commands to run analytical queries on different tables.
Download spark:
https://spark.apache.org/downloads.html
Click here to download
Local setup:
We need to have Java setup
Set below variables in Environment variable System Variables
JAVA_HOME : C:\Program Files\Java\jdk1.8.0_202
PATH : C:\Program Files\Java\jdk1.8.0_202
Spark Setup:
PATH : C:\Program Files\spark\bin
SPARK_HOME= C:\Program Files\spark
Scala spark: Open command prompt type spark-shell
PySpark : Open command prompt type pyspark
It will open spark shell.
Execution:
We can run Spark job locally with 1 thread or using N thread using
spark-shell --master local[N]
pyspark --master local[N]
Comments
Post a Comment