INSTALLATION GUIDE FOR HADOOP IN UBUNTU 12.04 (SINGLE NODE)
Installing
Java-7-oracle
sudo
add-apt-repository ppa:webupd8team/java
sudo
apt-get update
sudo
apt-get install oracle-java7-installer
Create a separate user for hadoop
$ sudo
addgroup hadoop
$ sudo
adduser --ingroup hadoop hduser
Configure
SSH
su -
hduser
ssh-keygen
-t rsa -P ""
To
be sure that SSH installation is went well, you can open a new
terminal and try to create ssh session using hduser by the following
command:
$ssh localhost
$ssh localhost
sudo
apt-get install openssh-server
Edit
Sudoers
pkexec
visudo
Add
below line to add hduser into sudoers
hduser
(ALL)=(ALL) ALL
Ctrl +
O to save nano
and exit from the editor
Disable
IPv6
$sudo
gedit /etc/sysctl.conf
This
command will open sysctl.conf in text editor, you can copy the
following lines at the end of the file:
#disable
ipv6
net.ipv6.conf.all.disable_ipv6
= 1
net.ipv6.conf.default.disable_ipv6
= 1
net.ipv6.conf.lo.disable_ipv6
= 1
If you
faced a problem telling you don't have permissions, just remember to
run the previous commands by your root account.
These
steps required you to reboot your system, but alternatively, you can
run the following command to re-initialize the configurations again.
$sudo
sysctl -p
To
make sure that IPV6 is disabled, you can run the following command:
$cat
/proc/sys/net/ipv6/conf/all/disable_ipv6
Configuration
of Hadoop
Installing
Hadoop
Now
we can download Hadoop to begin installation. Go to Apache
Downloads and
download Hadoop version 1.0.4.(current stable version)
Then you need to extract the tar file and rename the extracted folder to 'hadoop'. Open a new terminal and run the following command:
$
cd /home/hduser
$
sudo tar xzf hadoop-0.20.2.tar.gz
$
sudo mv hadoop-0.20.2 hadoop
Update
$HOME/.bashrc
You will need
to update the .bachrc for hduser (and for every user you need to
administer Hadoop). To open .bachrc file, you will need to open it as
root:
$sudo
gedit /home/hduser/.bashrc
Then
you will add the following configurations at the end of .bachrc file
#
Set Hadoop-related environment variables
export
HADOOP_HOME=/home/hduser/hadoop
#
Set JAVA_HOME (we will also configure JAVA_HOME directly for Hadoop
later on)
export
JAVA_HOME=/usr/lib/jvm/java-7-oracle
#
Some convenient aliases and functions for running Hadoop-related
commands
unalias
fs &> /dev/null
alias
fs="hadoop fs"
unalias
hls &> /dev/null
alias
hls="fs -ls"
#
If you have LZO compression enabled in your Hadoop cluster and
#
compress job outputs with LZOP (not covered in this tutorial):
#
Conveniently inspect an LZOP compressed file from the command
#
line; run via:
#
#
$ lzohead /hdfs/path/to/lzop/compressed/file.lzo
#
#
Requires installed 'lzop' command.
#
lzohead
() {
hadoop
fs -cat $1 | lzop -dc | head -1000 | less
}
#
Add Hadoop bin/ directory to PATH
export
PATH=$PATH:$HADOOP_HOME/bin
hadoop-env.sh
We need only
to update the JAVA_HOME variable in this file. Simply you will open
this file using a text editor using the following command:
$sudo
gedit /home/hduser/hadoop/conf/hadoop-env.sh
or
nano
/home/hduser/hduser/hadoop/conf/hadoop-env.sh
Then you will
need to change the following line
# export
JAVA_HOME=/usr/lib/j2sdk1.5-sun
To
export
JAVA_HOME=/usr/lib/jvm/java-7-oracle
Note: if you
faced "Error: JAVA_HOME is not set" Error while starting
the services, then you seems that you forgot toe uncomment the
previous line (just remove #).
core-site.xml
First, we need
to create a temp directory for Hadoop framework. If you need this
environment for testing or a quick prototype (e.g. develop simple
hadoop programs for your personal test ...), I suggest to create this
folder under /home/hduser/ directory, otherwise, you should
create this folder in a shared place under shared folder (like
/usr/local ...) but you may face some security issues. But to
overcome the exceptions that may caused by security (like
java.io.IOException), I have created the tmp folder under hduser
space.
To create this
folder, type the following command:
$
sudo mkdir /home/hduser/tmp
Please note
that if you want to make another admin user (e.g. hduser2 in hadoop
group), you should grant him a read and write permission on this
folder using the following commands:
$
sudo chown hduser:hadoop /home/hduser/tmp
$
sudo chmod 755 /home/hduser/tmp
Now, we can
open hadoop/conf/core-site.xml to edit the hadoop.tmp.dir entry.
We can open
the core-site.xml using text editor:
$sudo gedit
/home/hduser/hadoop/conf/core-site.xml
or
nano
/home/hduser/hduser/hadoop/conf/core-site.xml
Then add the
following configurations between ..
xml elements:
hadoop.tmp.dir
/home/hduser/tmp
A
base for other temporary directories.
fs.default.name
hdfs://localhost:54310
The
name of the default file system. A URI whose
scheme
and authority determine the FileSystem implementation. The
uri's
scheme determines the config property (fs.SCHEME.impl) naming
the
FileSystem implementation class. The uri's authority is used to
determine
the host, port, etc. for a filesystem.
mapred-site.xml
We
will open the hadoop/conf/mapred-site.xml using a text editor and add
the following configuration values (like core-site.xml)
nano
/home/hduser/hduser/hadoop/conf/mapred-site.xml
mapred.job.tracker
localhost:54311
The
host and port that the MapReduce job tracker runs
at.
If "local", then jobs are run in-process as a single
map
and
reduce task.
hdfs-site.xml
Open
hadoop/conf/hdfs-site.xml using a text editor and add the following
configurations:
nano
/home/hduser/hduser/hadoop/conf/hdfs-site.xml
dfs.replication
1
Default
block replication.
The
actual number of replications can be specified when the file is
created.
The
default is used if replication is not specified in create time.
Formatting NameNode
~/hduser/hadoop/bin/hadoop
namenode -format
You
should format the NameNode in your HDFS. You should not do this step
when the system is running. It is usually done once at first time of
your installation.
Run
the following command
$/home/hduser/hadoop/bin/hadoop
namenode -format
NameNode
Formatting
|
Starting Hadoop Cluster
You will need
to navigate to hadoop/bin directory and run ./start-all.sh script.
cd
~/hduser/hadoop/bin/
./start-all.sh
Starting Hadoop
Services using ./start-all.sh
|
There
is a nice tool called jps. You can use it to ensure that all the
services are up.