Skip to main content

How to install Hadoop 2.8.1 single node cluster on ubuntu



In this post, we are installing Hadoop-2.8.1 on Ubuntu OS. Followings are step by step process to install hadoop-2.8.1 as a single node cluster.

Before installing or downloading anything, It is always better to update using following command:

$ sudo apt-get update

Step 1: Install Java
Here i used java 8 oracle version. 
$ sudo apt-get install python-software-properties
$ sudo add-apt-repository ppa:webupd8team/java
$ sudo apt-get update
$ sudo apt-get install oracle-java8-installer
We can check JAVA is properly installed or not using following command:
$ java –version
Step 2: Add dedicated hadoop user
$ sudo addgroup hadoop
$ sudo adduser --ingroup hadoop hduser
NOTE: don’t write password or any things here, Just press ‘y’ when it ask “Is the information correct?[Y|n]”
$ sudo adduser hduser sudo
Step 3: Install SSH
$ sudo apt-get install ssh
Step-4: Passwordless entry for localhost using SSH
$ su hduser
Now we are logined in in ‘hduser’.
$ ssh-keygen -t rsa
NOTE: Leave file name and other things blank.
$ cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys
$ chmod 0600 ~/.ssh/authorized_keys
$ ssh localhost
Use above command and make sure it is passwordless login. Once we are logined in localhost, exit from this session using following command
$ exit
Step 5: Install hadoop-2.8.1
NOTE: If any problem with downloading, you can download directly use this link https://archive.apache.org/dist/hadoop/core/hadoop-2.8.1/.
$ wget http://www-us.apache.org/dist/hadoop/common/hadoop-2.8.1/hadoop-2.8.1.tar.gz
$ tar xvzf hadoop-2.8.1.tar.gz
$ sudo mkdir -p /usr/local/hadoop
$ cd hadoop-2.8.1/
$ sudo mv * /usr/local/hadoop
$ sudo chown -R hduser:hadoop /usr/local/hadoop
Step 6: Setup Configuration Files
The following files should to be modified to complete the Hadoop setup:
6.1 ~/.bashrc
6.2 hadoop-env.sh
6.3 core-site.xml
6.4 mapred-site.xml
6.5 hdfs-site.xml
6.6 yarn-site.xml
6.1 ~/.bashrc
First, we need to find the path where JAVA is installed in our system
$ update-alternatives --config java
Now we append at the end of ~/.bashrc:
$ sudo nano ~/.bashrc
Append following at the end. (Follow this process -> Enter ‘ctrl+x’ -> Enter ‘yes’ -> Press Enter )
#HADOOP VARIABLES START
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
export HADOOP_HOME=/usr/local/hadoop
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export YARN_HOME=$HADOOP_HOME
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib"
#HADOOP VARIABLES END
Update .bashrc file to apply changes
$ source ~/.bashrc
6.2 hadoop-env.sh
We need to modify JAVA_HOME path in hadoop-env.sh to ensure that the value of JAVA_HOME variable will be available to Hadoop whenever it is started up.
$ sudo nano /usr/local/hadoop/etc/hadoop/hadoop-env.sh
Search JAVA_HOME variable in file. It may first variable in file. Do Change it by following:
export JAVA_HOME=/usr/lib/jvm/java-8-openjdk-amd64
6.3 core-site.xml
core-site.xml file has configuration properties which are requires when Hadoop is started up.
$ sudo mkdir -p /app/hadoop/tmp

$ sudo chown hduser:hadoop /app/hadoop/tmp
Open the file and enter the following in between the <configuration></configuration> tag:
$ nano /usr/local/hadoop/etc/hadoop/core-site.xml
<configuration>
<property>
 <name>hadoop.tmp.dir</name>
   <value>/app/hadoop/tmp</value>
   <description>A base for other temporary directories.</description>
</property>
<property>
  <name>fs.default.name</name>
   <value>hdfs://localhost:54310</value>
    <description>The name of the default file system.  A URI whose scheme and authority determine the FileSystem implementation.  The uri's scheme determines the config property (fs.SCHEME.impl) naming the FileSystem implementation class.  The uri's authority is used to determine the host, port, etc. for a filesystem.</description>
 </property>
</configuration>
6.4 mapred-site.xml
By default, the /usr/local/hadoop/etc/hadoop/ folder contains /usr/local/hadoop/etc/hadoop/mapred-site.xml.template file which has to be renamed/copied with the name mapred-site.xml:
$ cp /usr/local/hadoop/etc/hadoop/mapred-site.xml.template /usr/local/hadoop/etc/hadoop/mapred-site.xml
The /usr/local/hadoop/etc/hadoop/mapred-site.xml file is used to specify which framework is being used for MapReduce.
We need to enter the following content in between the <configuration></configuration> tag:
$ nano /usr/local/hadoop/etc/hadoop/mapred-site.xml
<configuration>
<property>
  <name>mapred.job.tracker</name>
    <value>localhost:54311</value>
      <description> The host and port that the MapReduce job tracker runs at.  If "local", then jobs are run in-process as a single map and reduce task.
      </description>
</property>
<property>
<name>mapreduce.framework.name</name>
   <value>yarn</value>
</property>
</configuration>
6.5 hdfs-site.xml
We need to configure hdfs-site.xml for each host in the cluster which specifies two directories:
  1. Name node
  2. Data node
These can be done using the following commands:
$ sudo mkdir -p /usr/local/hadoop_store/hdfs/namenode
$ sudo mkdir -p /usr/local/hadoop_store/hdfs/datanode
$ sudo chown -R hduser:hadoop /usr/local/hadoop_store
Open hdfs-site.xml file and enter the following content in between the <configuration></configuration> tag:
$ nano /usr/local/hadoop/etc/hadoop/hdfs-site.xml
<configuration>
 <property>
  <name>dfs.replication</name>
  <value>1</value>
  <description>Default block replication.The actual number of replications can be specified when the file is created. The default is used if replication is not specified in create time.
  </description>
 </property>
 <property>
   <name>dfs.namenode.name.dir</name>
   <value>file:/usr/local/hadoop_store/hdfs/namenode</value>
 </property>
 <property>
   <name>dfs.datanode.data.dir</name>
   <value>file:/usr/local/hadoop_store/hdfs/datanode</value>
 </property>
</configuration>
6.6 yarn-site.xml
Open hdfs-site.xml file and enter the following content in between the <configuration></configuration> tag:
$ nano /usr/local/hadoop/etc/hadoop/yarn-site.xml
 <configuration>
   <property>
      <name>yarn.nodemanager.aux-services</name>
      <value>mapreduce_shuffle</value>
   </property>
</configuration>
Step7: Format hadoop file system
$  hadoop namenode –format
Step 8: Start Hadoop Daemons
$  cd /usr/local/hadoop/sbin

$ start-all.sh
We can check all daemons are properly started using following command:
$ jps
Step 9: Stop hadoop Daemons
$ stop-all.sh
Congratulation..!! We have installed hadoop successfully.. 
Hadoop has Web Interfaces too. (Copy and paste following links in your browser)
NameNode daemon: http://localhost:50070/
Resource Manager: http://localhost:8088/
Now you can use all hadoop commands here.
 Just make sure you are login in ‘hduser’ because our hadoop setup is available on this dedicated ‘hduser’.
Thank you..!

Comments

Post a Comment

Popular posts from this blog

Hbase installation on ubuntu

Hbase installation on ubuntu In this tutorial we will see how to install Hbase on ubuntu 16.04 by doing the following steps Step 1: Before installing Hbase, you need to First ensure that java8 is installed: sudo add-apt-repository ppa:webupd8team/java sudo apt-get update sudo apt-get install oracle-java8-installer Verify that java is correctly installed: java -version       Configuring Java Environment sudo apt-get install oracle-java8-set-default    Step 2: Ensure that you successfully installed hadoop on your machine  Check this link if you need to know how to install it.  Step 3: Download Apache Hbase Go to downloads page Choose hbase file: hbase-1.2.5-bin.tar.gz Step 4: Complete the installation process Move the downloaded file “ hbase-1.2.5-bin.tar.gz ” to your home (~) Compress it :  tar -zxvf hbase-1.2.5-bin.tar.gz Edit hbase-env.sh using this command lines: cd /usr/local/hbase/con

How To Install CouchDB and Futon on Ubuntu 14.04

How To Install CouchDB and Futon on Ubuntu 14.04    Introduction Apache CouchDB , like Redis, Cassandra, and MongoDB, is a NoSQL database . CouchDB stores data as JSON documents which are non-relational in nature. This allows users of CouchDB to store data in ways that look very similar to their real world counterparts. You can manage CouchDB from the command line or from a web interface called Futon. Futon can be used to perform administrative tasks like creating and manipulating databases, documents, and users for CouchDB. Goals By the end of this article, you will: Have CouchDB installed on a Droplet running Ubuntu 14.04 Have Futon installed on the same server Have secured the CouchDB installation Access CouchDB using Futon from your local machine, using a secure tunnel Know how to add an admin user to CouchDB Perform CRUD operations with CouchDB using Futon Perform CRUD operations with CouchDB from the command line Prerequisites Please compl

How to Install MongoDB on Ubuntu 16.04

How to Install MongoDB on Ubuntu 16.04                                         MongoDB is an open source database management system (DBMS)  that uses a  document-oriented database model which supports various forms of data. Step 1: Adding the MongoDB Repository       MongoDB is already included in Ubuntu package repositories, but the official MongoDB repository  provides most up-to-date version and is the recommended way of installing the software. In this step,  we will add this official repository to our server. Ubuntu ensures the authenticity of software packages by verifying that they are signed with GPG keys,  so we first have to import they key for the official MongoDB repository. sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv EA312927 After successfully importing the key, you will see: gpg: Total number processed: 1 gpg:               imported: 1  (RSA: 1) Next, we have to add the MongoDB repository details so apt will