Steps for Apache Spark Installation On Ubuntu
Follow the steps given below for Apache Spark Installation On Ubuntu-
1. Deployment Platform
i. Platform Requirements
Operating System: You can use Ubuntu 14.04 or later (other Linux flavors can also be used like CentOS, Redhat, etc.)
Spark: Apache Spark 1.6.1 or later
ii. Setup Platform
If you are using Windows / Mac OS you can create a virtual machine and install Ubuntu using VMWare Player, alternatively, you can create a virtual machine and install Ubuntu using Oracle Virtual Box.
2. Prerequisites
i. Install Java 7
a. Install Python Software Properties
$sudo apt-get install python-software-propertiesb. Add Repository
$sudo add-apt-repository ppa:webupd8team/java
c. Update the source list
$sudo apt-get update
d. Install Java
$sudo apt-get install oracle-java7-installer
3. Install Apache Spark
i. Download Spark
You can download Apache Spark from the below link. In the package type please select “Pre-built for Hadoop 2.6 and Later”
http://spark.apache.org/downloads.html Or, you can use direct download link:
http://mirror.fibergrid.in/apache/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.6.tgz
ii. Untar Spark Setup
$tar xzf spark-1.6.1-bin-hadoop2.6.tgz
You can find all the scripts and configuration files in newly created directory “spark-1.6.1-bin-hadoop2.6”
iii. Setup Configuration
a. Edit .bashrc
Edit .bashrc file located in user’s home directory and add following parameters-
export JAVA_HOME=<path-to-the-root-of-your-Java-installation> (eg: /usr/lib/jvm/java-7-oracle/)
export SPARK_HOME=<path-to-the-root-of-your-spark-installation> (eg: /home/dataflair/spark-1.6.1-bin-hadoop2.6/)
4. Launch the Spark Shell
Go to Spark home directory (spark-1.6.1-bin-hadoop2.6) and run below command to start Spark Shell
$bin/spark-shell.sh
Spark shell is launched, now you can play with Spark
i. Spark UI
This is the GUI for Spark Application, in local mode spark shell runs as an application. The GUI provide details about stages, storage (cached RDDs), Environment Variables and executors
http://localhost:4040
That's
it! You are ready to rock on using Apache Spark!
Nice da
ReplyDeletethank u..
Delete