Install Hadoop @ localhost
Apache Hadoop open-source BigData platform
Last updated
Apache Hadoop open-source BigData platform
Last updated
For testing/learning purposes we'll install Apache Hadoop on a single-node local machine following theses steps:
Hadoop is a huge project under the Apache Software Foundation, it's mainly written in Java.
Install Parallel Distributed Shell to issue commands to groups of hosts in parallel
Create a new user 'hadoop' and set up the password for the 'hadoop' user
Add the 'hadoop' user to the 'sudo' group via the usermod command below.
Log in to the 'hadoop' user
Generate SSH public and private key
SSH public key 'id_rsa.pub' to the 'authorized_keys' file and change the default permission to 600
Download the Apache Hadoop package to the current working directory, extract and copy to its appropriated path
Lastly, change the ownership of the hadoop installation directory
Add the following lines to ~/.bashrc
Next, run the below command to apply new changes within the file '~/.bashrc'.
Verify by checking each environment variable, i.e:
Also configure the JAVA_HOME environment variable in the 'hadoop-env.sh' script
Uncomment the JAVA_HOME environment line and change the value to the Java OpenJDK installation directory
Let's check that all is working as expected
This allows you to run a hadoop cluster with distributed mode even with only a single node/server. In this mode, hadoop processes will be run in separate Java processes.
core-site.xml - This will be used to define NameNode for the hadoop cluster.
hdfs-site.xml - This configuration will be sued to define the DataNode on the hadoop cluster.
mapred-site.xml - The MapReduce configuration for the hadoop cluster.
yarn-site.xml - ResourceManager and NodeManager configuration for hadoop cluster.
We'll set up an Apache Hadoop cluster with Pseudo-Distributed mode on a single Ubuntu machine. To do that, we'll make changes to some of the hadoop configurations:
Add the below lines to the file $HADOOP_HOME/etc/hadoop/core-site.xml
Next, run the following command to create new directories that will be used for the DataNode on the hadoop cluster. Then, change the ownership of DataNode directories to the 'hadoop' user.
After that, Add the following configuration to the file $HADOOP_HOME/etc/hadoop/hdfs-site.xml
With the NameNode and DataNode configured, run the below command to format the hadoop filesystem
Start the NameNode and DataNode via the following command
The output looks like
Now that the NameNode and DataNode processes are running and the web interface is liseting at port '9870'