Install Hadoop @ localhost
Apache Hadoop open-source BigData platform
Install Apache Hadoop on Ubuntu Desktop 22.04
For testing/learning purposes we'll install Apache Hadoop on a single-node local machine following theses steps:
Installing Java OpenJDK
Hadoop is a huge project under the Apache Software Foundation, it's mainly written in Java.
Setting up user and Password-less SSH Authentication
Install Parallel Distributed Shell to issue commands to groups of hosts in parallel
Create a new user 'hadoop' and set up the password for the 'hadoop' user
Add the 'hadoop' user to the 'sudo' group via the usermod command below.
Log in to the 'hadoop' user
Generate SSH public and private key
SSH public key 'id_rsa.pub' to the 'authorized_keys' file and change the default permission to 600
Downloading Hadoop
Download the Apache Hadoop package to the current working directory, extract and copy to its appropriated path
Lastly, change the ownership of the hadoop installation directory
Setting up Hadoop Environment Variables
Add the following lines to ~/.bashrc
Next, run the below command to apply new changes within the file '~/.bashrc'.
Verify by checking each environment variable, i.e:
Also configure the JAVA_HOME environment variable in the 'hadoop-env.sh' script
Uncomment the JAVA_HOME environment line and change the value to the Java OpenJDK installation directory
Let's check that all is working as expected
Setting up Hadoop Cluster: Pseudo-Distributed Mode
This allows you to run a hadoop cluster with distributed mode even with only a single node/server. In this mode, hadoop processes will be run in separate Java processes.
core-site.xml - This will be used to define NameNode for the hadoop cluster.
hdfs-site.xml - This configuration will be sued to define the DataNode on the hadoop cluster.
mapred-site.xml - The MapReduce configuration for the hadoop cluster.
yarn-site.xml - ResourceManager and NodeManager configuration for hadoop cluster.
We'll set up an Apache Hadoop cluster with Pseudo-Distributed mode on a single Ubuntu machine. To do that, we'll make changes to some of the hadoop configurations:
Setting up NameNode and DataNode
Add the below lines to the file $HADOOP_HOME/etc/hadoop/core-site.xml
Next, run the following command to create new directories that will be used for the DataNode on the hadoop cluster. Then, change the ownership of DataNode directories to the 'hadoop' user.
After that, Add the following configuration to the file $HADOOP_HOME/etc/hadoop/hdfs-site.xml
With the NameNode and DataNode configured, run the below command to format the hadoop filesystem
Start the NameNode and DataNode via the following command
The output looks like
Now that the NameNode and DataNode processes are running and the web interface is liseting at port '9870'
Last updated