Skip to main content

Steps for Apache Spark Installation On Ubuntu

Steps for Apache Spark Installation On Ubuntu

Follow the steps given below for Apache Spark Installation On Ubuntu-

1. Deployment Platform

i. Platform Requirements

Operating System: You can use Ubuntu 14.04 or later (other Linux flavors can also be used like CentOS, Redhat, etc.)

Spark: Apache Spark 1.6.1 or later

ii. Setup Platform

If you are using Windows / Mac OS you can create a virtual machine and install Ubuntu using VMWare Player, alternatively, you can create a virtual machine and install Ubuntu using Oracle Virtual Box.

2. Prerequisites

i. Install Java 7

a. Install Python Software Properties

$sudo apt-get install python-software-properties

b. Add Repository

$sudo add-apt-repository ppa:webupd8team/java

c. Update the source list

$sudo apt-get update

d. Install Java

$sudo apt-get install oracle-java7-installer

3. Install Apache Spark

i. Download Spark

You can download Apache Spark from the below link. In the package type please select “Pre-built for Hadoop 2.6 and Later”

http://spark.apache.org/downloads.html Or, you can use direct download link:

http://mirror.fibergrid.in/apache/spark/spark-1.6.1/spark-1.6.1-bin-hadoop2.6.tgz

ii. Untar Spark Setup

$tar xzf spark-1.6.1-bin-hadoop2.6.tgz

You can find all the scripts and configuration files in newly created directory “spark-1.6.1-bin-hadoop2.6”

iii. Setup Configuration

a. Edit .bashrc

Edit .bashrc file located in user’s home directory and add following parameters-

export JAVA_HOME=<path-to-the-root-of-your-Java-installation> (eg: /usr/lib/jvm/java-7-oracle/)
export SPARK_HOME=<path-to-the-root-of-your-spark-installation> (eg: /home/dataflair/spark-1.6.1-bin-hadoop2.6/)

4. Launch the Spark Shell

Go to Spark home directory (spark-1.6.1-bin-hadoop2.6) and run below command to start Spark Shell

$bin/spark-shell.sh

Spark shell is launched, now you can play with Spark

i. Spark UI

This is the GUI for Spark Application, in local mode spark shell runs as an application. The GUI provide details about stages, storage (cached RDDs), Environment Variables and executors

http://localhost:4040

That's it! You are ready to rock on using Apache Spark!

Comments

Post a Comment

Popular posts from this blog

Hbase installation on ubuntu

Hbase installation on ubuntu In this tutorial we will see how to install Hbase on ubuntu 16.04 by doing the following steps Step 1: Before installing Hbase, you need to First ensure that java8 is installed: sudo add-apt-repository ppa:webupd8team/java sudo apt-get update sudo apt-get install oracle-java8-installer Verify that java is correctly installed: java -version       Configuring Java Environment sudo apt-get install oracle-java8-set-default    Step 2: Ensure that you successfully installed hadoop on your machine  Check this link if you need to know how to install it.  Step 3: Download Apache Hbase Go to downloads page Choose hbase file: hbase-1.2.5-bin.tar.gz Step 4: Complete the installation process Move the downloaded file “ hbase-1.2.5-bin.tar.gz ” to your home (~) Compress it :  tar -zxvf hbase-1.2.5-bin.tar.gz Edit hbase-env.sh using this co...

Loopback - Create datasource and model for Cassandra

Loopback 3.0- Create datasource and model for Cassandra Pre-Installed:-                           Loopback 3.0 and cassandra  Step 1: Creating a Keyspace using Cqlsh cqlsh.> CREATE KEYSPACE test WITH replication = {'class':'SimpleStrategy', 'replication_factor' : 3}; cqlsh> DESCRIBE keyspaces; test system system_traces Step 2: Creating a table using Cqlsh cqlsh> USE test; cqlsh:test>; CREATE TABLE pullcassandra( id text PRIMARY KEY emp_id text, emp_name text, emp_city text, emp_sal text, emp_phone text, ); "id" - for store the object key which is generated by loopback  Step 3: Creating a datasouce In your application root directory, enter this command to install the connector: npm install loopback-connector-cassandra --save $ lb datasource ? Enter the data-source name: mycass ? Select the connector for mycass: Cassandra (s...

Hive Installation on Ubuntu

Hive Installation on Ubuntu: Please follow the below steps to install  Apache Hive  on Ubuntu: Step 1:   Download  Hive tar. Command:  wget http://archive.apache.org/dist/hive/hive-2.1.0/apache-hive-2.1.0-bin.tar.gz Step 2:    Extract the  tar  file. Command:  tar -xzf apache-hive-2.1.0-bin.tar.gz Command:  ls Step 3:  Edit the  “.bashrc”  file to update the environment variables for user. Command:   sudo gedit .bashrc Add the following at the end of the file: # Set HIVE_HOME export HIVE_HOME=/home/hduser/apache-hive-2.1.0-bin export PATH=$PATH:/home/ hduser /apache-hive-2.1.0-bin/bin Also, make sure that hadoop path is also set. Run below command to make the changes work in same terminal. Command:  source .bashrc Step 4:  Check hive version. Command:  hive --version Step 5:    Create  Hive  directories within  HDFS . T...