Hadoop Multinode Cluster Configuration

In this blog we will describe the steps and required configurations for setting up a distributed multi-node Apache Hadoop cluster.

Prerequisites

1. Single node hadoop cluster

{If you have not configured single node hadoop cluster yet, then click below to configure single node Hadoop cluster first.}

How to install single node hadoop cluster

After configuring single node Hadoop cluster, make clone of your single node cluster to set-up multi-node Hadoop cluster.

Cloning steps-

a> Right click on your Masternode (single node cluster), you will get a screen like below-

b> Select clone option

c > give a new name to clone machine-

make sure you have clicked on Reintialize the MAC address of all network cards –

Note- [ Reinitialize the mac address while cloning. ]

d > select Full clone

Now click on clone option it will take some time to make a new virtual machine ( Datanode).

Repeat the same process to make second Datanode.

Note- [ Reinitialize the mac address while cloning. ]

2.Networking

Networking plays an important role here, before merging single node cluster into a multi node cluster we need to make sure that all the node pings each other( they need to be connected on the same network / hub or both the machines can speak to each other).

In this, Network configuration for Hadoop Clusters are following-

IP Address for Masternode (Namenode) is – 192.168.10.100

IP Address of Datanode 1 (slave node) – 192.168.10.101

IP Address of Datanode 2 (slave node) – 192.168.10.102

Check the communication between master and slaves-

Ping through IP address-

 

If they are connecting then ping through there hostname-

 

Note- Verify pinging from slave nodes also, to check whether they are able to communicate with Master node or not. If you are getting acknowledgement, then you are able to communicate.

c) Verify password less ssh login –

 

d) Stop iptables of each Node( Namenode, Datanode1, Datanode2)-

or

Come to your Master node (Namenode)-

Namenode Configuration –

Before configuring Master node (Namenode), make sure you have configured /etc/hosts file.

To configure /etc/hosts file-

Now follow the steps to make changes on each machine (Nodes) –

These are the changes have to be made on Master node (Namenode)

1) login your Master node (Namenode) and move on hadoop directory to make changes-

2) open core-site.xml and modify copy the following –

3) open hdfs-site.xml

Note:–  Here <value>/home/hadoop/hadoop/namenode</value>  ,

/home/hadoop is the home directory of hadoop user. you need to give your user directory name.

and rest part is directory name which we have created .

 

4) open mapred-site.xml

5) open yarn-site.xml and add these entries-

see the screen-shot below-

6) Restart the ssh service by typing the below command. 

DataNode Configuration-

Before configuring Datanode make sure have configured /etc/hosts file.

To configure /etc/hosts file-

Follow the steps to update Datanode

1) Login to your Datanode and move on hadoop directory to make changes-

2) open core-site.xml and modify copy the following –

3) open hdfs-site.xml

Note:–  Here <value>/home/hadoop/hadoop/namenode</value>  ,

/home/hadoop is home directory of hadoop user. you need to give your user directory name.

and rest part is directory name which we have created .

4) open yarn-site.xml

5) open mapred-site.xml

6) Restart the ssh service by typing the below command.

Note- Repeat the same steps for all DataNode Configuration.

Create /home/hadoop/hadoop/namenode directory to Master node (Namenode) and /home/hadoop/hadoop/datanode directory to both Datanodes(Slave Nodes)-

Note- If they are already exist then remove it and create new directories by above commands.

Login your Masternode (Namenode) and follow these steps to start your hadoop cluster-

To start all the daemons follow the below steps:

1) Format the NameNode first:

2) Starting dfs daemons in Namenode-

Starting NameNode:

Type the below command to start dfs daemons:-

3) type jps to see running daemons-

4) start yarn and historyserver daemons-

You can also use start-all.sh to start all daemons-

 Login to your data node and verify the running daemons-

you can also check on another datanode-

here the screen shot, where you can see the running daemons on each Nodes-

6) Verify live slave nodes by hadoop dfsadmin report :-

Now open your browser and copy below addresses into url bar-

you  can see a screen like that –

This is your GUI ( a webserver of hadoop) for hadoop cluster.

Through GUI you can easily manage your cluster

Advertisements
Categories:

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s