INSTALLATION GUIDE FOR HADOOP IN UBUNTU 12.04 (SINGLE NODE)
Installing
Java-7-oracle
sudo
add-apt-repository ppa:webupd8team/java
sudo apt-get
update
Create a
separate user for hadoop
$ sudo
addgroup hadoop
$ sudo
adduser --ingroup hadoop hduser
Configure
SSH
su - hduser
ssh-keygen
-t rsa -P ""
To
be sure that SSH installation is went well, you can open a new terminal and try
to create ssh session using hduser by the following command:
$ssh localhost
$ssh localhost
if localhost
is not connected reinstalltion of ssh is needed
sudo apt-get
install openssh-server
Edit Sudoers
pkexec
visudo
Add below
line to add hduser into sudoers
hduser
(ALL)=(ALL) ALL
Ctrl + O to
save nano
and exit
from the editor
Disable IPv6
$sudo gedit
/etc/sysctl.conf
This command
will open sysctl.conf in text editor, you can copy the following lines at the
end of the file:
#disable
ipv6
net.ipv6.conf.all.disable_ipv6
= 1
net.ipv6.conf.default.disable_ipv6
= 1
net.ipv6.conf.lo.disable_ipv6
= 1
If you faced
a problem telling you don't have permissions, just remember to run the previous
commands by your root account.
These steps
required you to reboot your system, but alternatively, you can run the
following command to re-initialize the configurations again.
$sudo sysctl
-p
To make sure
that IPV6 is disabled, you can run the following command:
$cat
/proc/sys/net/ipv6/conf/all/disable_ipv6
Configuration
of Hadoop
Installing
Hadoop
Now
we can download Hadoop to begin installation. Go to Apache
Downloads and download Hadoop version 1.0.4.(current
stable version)
Then you need to extract the tar file and rename the extracted folder to 'hadoop'. Open a new terminal and run the following command:
$ cd /home/hduser
$ sudo tar xzf
hadoop-0.20.2.tar.gz
$ sudo mv
hadoop-0.20.2 hadoop
Update
$HOME/.bashrc
You will need to update the .bachrc
for hduser (and for every user you need to administer Hadoop). To open .bachrc
file, you will need to open it as root:
$sudo gedit
/home/hduser/.bashrc
Then you will add
the following configurations at the end of .bachrc file
# Set Hadoop-related
environment variables
export
HADOOP_HOME=/home/hduser/hadoop
# Set JAVA_HOME
(we will also configure JAVA_HOME directly for Hadoop later on)
export
JAVA_HOME=/usr/lib/jvm/java-7-oracle
# Some convenient
aliases and functions for running Hadoop-related commands
unalias fs
&> /dev/null
alias
fs="hadoop fs"
unalias hls
&> /dev/null
alias hls="fs
-ls"
# If you have LZO
compression enabled in your Hadoop cluster and
# compress job
outputs with LZOP (not covered in this tutorial):
# Conveniently
inspect an LZOP compressed file from the command
# line; run via:
#
# $ lzohead
/hdfs/path/to/lzop/compressed/file.lzo
#
# Requires
installed 'lzop' command.
#
lzohead () {
hadoop fs -cat $1 | lzop -dc | head -1000 | less
}
# Add Hadoop bin/
directory to PATH
export
PATH=$PATH:$HADOOP_HOME/bin
hadoop-env.sh
We need only to update the JAVA_HOME
variable in this file. Simply you will open this file using a text editor using
the following command:
$sudo gedit
/home/hduser/hadoop/conf/hadoop-env.sh
or
nano
/home/hduser/hduser/hadoop/conf/hadoop-env.sh
Then you will need to change the
following line
# export
JAVA_HOME=/usr/lib/j2sdk1.5-sun
To
export
JAVA_HOME=/usr/lib/jvm/java-7-oracle
Note: if you faced "Error:
JAVA_HOME is not set" Error while starting the services, then you seems
that you forgot toe uncomment the previous line (just remove #).
core-site.xml
First, we need to create a temp
directory for Hadoop framework. If you need this environment for testing or a
quick prototype (e.g. develop simple hadoop programs for your personal test
...), I suggest to create this folder under /home/hduser/ directory,
otherwise, you should create this folder in a shared place under shared folder
(like /usr/local ...) but you may face some security issues. But to
overcome the exceptions that may caused by security (like java.io.IOException),
I have created the tmp folder under hduser space.
To create this folder, type the
following command:
$ sudo mkdir
/home/hduser/tmp
Please note that if you want to make
another admin user (e.g. hduser2 in hadoop group), you should grant him a read
and write permission on this folder using the following commands:
$ sudo chown
hduser:hadoop /home/hduser/tmp
$ sudo chmod 755
/home/hduser/tmp
Now, we can open hadoop/conf/core-site.xml
to edit the hadoop.tmp.dir entry.
We can open the core-site.xml using
text editor:
$sudo gedit
/home/hduser/hadoop/conf/core-site.xml
or
nano
/home/hduser/hduser/hadoop/conf/core-site.xml
Then add the following configurations
between .. xml elements:
hadoop.tmp.dir
/home/hduser/tmp
A base for
other temporary directories.
fs.default.name
hdfs://localhost:54310
The name of
the default file system. A URI whose
scheme and
authority determine the FileSystem implementation. The
uri's
scheme determines the config property (fs.SCHEME.impl) naming
the
FileSystem implementation class. The uri's authority is used to
determine
the host, port, etc. for a filesystem.
mapred-site.xml
We will open the
hadoop/conf/mapred-site.xml using a text editor and add the following
configuration values (like core-site.xml)
nano
/home/hduser/hduser/hadoop/conf/mapred-site.xml
mapred.job.tracker
localhost:54311
The host
and port that the MapReduce job tracker runs
at.
If "local", then jobs are run in-process as a single map
and reduce
task.
hdfs-site.xml
Open
hadoop/conf/hdfs-site.xml using a text editor and add the following
configurations:
nano
/home/hduser/hduser/hadoop/conf/hdfs-site.xml
dfs.replication
1
Default
block replication.
The actual
number of replications can be specified when the file is created.
The default
is used if replication is not specified in create time.
Formatting NameNode
~/hduser/hadoop/bin/hadoop
namenode -format
You should format
the NameNode in your HDFS. You should not do this step when the system is
running. It is usually done once at first time of your installation.
Run the following
command
$/home/hduser/hadoop/bin/hadoop
namenode -format
Starting Hadoop Cluster
You will need to navigate to
hadoop/bin directory and run ./start-all.sh script.
cd ~/hduser/hadoop/bin/
./start-all.sh
There is a nice tool
called jps. You can use it to ensure that all the services are up.
0 comments:
Post a Comment