CentOS 7
Sponsored Link

Apache Hadoop インストール
2015/07/28
 
Apache Hadoop をインストールします。
ここでは例として、以下の 3ノードを利用して分散処理できるよう設定します。
当例は 3ノードのみですが、Hadoop は大規模なデータを大規模なノード構成で処理する場合に、より真価を発揮します。
1) dlp.srv.world (マスターノード)
2) node01.srv.world (スレーブノード)
3) node02.srv.world (スレーブノード)
[1]
[2] 全ノードに Hadoop 用のユーザーを作成しておきます。
[root@dlp ~]#
useradd -d /usr/hadoop hadoop

[root@dlp ~]#
chmod 755 /usr/hadoop

[root@dlp ~]#
passwd hadoop

Changing password for user hadoop.
New password:
Retype new password:
passwd: all authentication tokens updated successfully.
[3] マスターノードに hadoop ユーザーでログインし、SSH 鍵ペア(ノーパスフレーズ)を作成して、各ノードに配布します。
[hadoop@dlp ~]$
ssh-keygen

Generating public/private rsa key pair.
Enter file in which to save the key (/usr/hadoop/.ssh/id_rsa):
Created directory '/usr/hadoop/.ssh'.
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /usr/hadoop/.ssh/id_rsa.
Your public key has been saved in /usr/hadoop/.ssh/id_rsa.pub.
The key fingerprint is:
xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx hadoop@dlp.srv.world
The key's randomart image is:
# ローカルホスト含め各ノードに配布

[hadoop@dlp ~]$
ssh-copy-id localhost

The authenticity of host 'localhost (::1)' can't be established.
ECDSA key fingerprint is xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx:xx.
Are you sure you want to continue connecting (yes/no)?
yes

/bin/ssh-copy-id: INFO: attempting to log in with the new key(s), to filter out any that are already installed
/bin/ssh-copy-id: INFO: 1 key(s) remain to be installed -- if you are prompted now it is to install the new keys
hadoop@localhost's password:

Number of key(s) added: 1

Now try logging into the machine, with:   "ssh 'localhost'"
and check to make sure that only the key(s) you wanted were added.

[hadoop@dlp ~]$
ssh-copy-id node01.srv.world

[hadoop@dlp ~]$
ssh-copy-id node02.srv.world

[4] 全ノードに Hadoop をインストールします。hadoop ユーザーで作業します。
以下のサイトで最新版を確認してダウンロードしてください。
⇒ https://hadoop.apache.org/releases.html
[hadoop@dlp ~]$
curl -O http://ftp.jaist.ac.jp/pub/apache/hadoop/common/hadoop-2.7.1/hadoop-2.7.1.tar.gz

[hadoop@dlp ~]$
tar zxvf hadoop-2.7.1.tar.gz -C /usr/hadoop --strip-components 1

[hadoop@dlp ~]$
vi ~/.bash_profile
# 最終行に追記

export HADOOP_HOME=/usr/hadoop
export HADOOP_COMMON_HOME=$HADOOP_HOME
export HADOOP_HDFS_HOME=$HADOOP_HOME
export HADOOP_MAPRED_HOME=$HADOOP_HOME
export HADOOP_YARN_HOME=$HADOOP_HOME
export HADOOP_OPTS="-Djava.library.path=$HADOOP_HOME/lib/native"
export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native
export PATH=$PATH:$HADOOP_HOME/sbin:$HADOOP_HOME/bin
[hadoop@dlp ~]$
source ~/.bash_profile

[5] マスターノードで Hadoop の設定をします。hadoop ユーザーで作業します。
# 全ノードでデータ用ディレクトリ作成

[hadoop@dlp ~]$
mkdir ~/datanode

[hadoop@dlp ~]$
ssh node01.srv.world "mkdir ~/datanode"

[hadoop@dlp ~]$
ssh node02.srv.world "mkdir ~/datanode"
[hadoop@dlp ~]$
vi ~/etc/hadoop/hdfs-site.xml
# <configuration> ~ </configuration> 内に追記

<configuration>
  <property>
    <name>dfs.replication</name>
    <value>2</value>
  </property>
  <property>
    <name>dfs.datanode.data.dir</name>
    <value>file:///usr/hadoop/datanode</value>
  </property>
</configuration>
# スレーブノードに送信

[hadoop@dlp ~]$
scp ~/etc/hadoop/hdfs-site.xml node01.srv.world:~/etc/hadoop/

[hadoop@dlp ~]$
scp ~/etc/hadoop/hdfs-site.xml node02.srv.world:~/etc/hadoop/

[hadoop@dlp ~]$
vi ~/etc/hadoop/core-site.xml
# <configuration> ~ </configuration> 内に追記

<configuration>
  <property>
    <name>fs.defaultFS</name>
    <value>hdfs://dlp.srv.world:9000/</value>
  </property>
</configuration>
# スレーブノードに送信

[hadoop@dlp ~]$
scp ~/etc/hadoop/core-site.xml node01.srv.world:~/etc/hadoop/

[hadoop@dlp ~]$
scp ~/etc/hadoop/core-site.xml node02.srv.world:~/etc/hadoop/

[hadoop@dlp ~]$
sed -i -e 's/\${JAVA_HOME}/\/usr\/java\/default/' ~/etc/hadoop/hadoop-env.sh

# スレーブノードに送信

[hadoop@dlp ~]$
scp ~/etc/hadoop/hadoop-env.sh node01.srv.world:~/etc/hadoop/

[hadoop@dlp ~]$
scp ~/etc/hadoop/hadoop-env.sh node02.srv.world:~/etc/hadoop/
[hadoop@dlp ~]$
mkdir ~/namenode

[hadoop@dlp ~]$
vi ~/etc/hadoop/hdfs-site.xml
# <configuration> ~ </configuration> 内に追記

<configuration>
  <property>
    <name>dfs.namenode.name.dir</name>
    <value>file:///usr/hadoop/namenode</value>
  </property>
</configuration>
[hadoop@dlp ~]$
vi ~/etc/hadoop/mapred-site.xml
# 新規作成

<configuration>
  <property>
    <name>mapreduce.framework.name</name>
    <value>yarn</value>
  </property>
</configuration>
[hadoop@dlp ~]$
vi ~/etc/hadoop/yarn-site.xml
# <configuration> ~ </configuration> 内に追記

<configuration>
  <property>
    <name>yarn.resourcemanager.hostname</name>
    <value>dlp.srv.world</value>
  </property>
  <property>
    <name>yarn.nodemanager.hostname</name>
    <value>dlp.srv.world</value>
  </property>
  <property>
    <name>yarn.nodemanager.aux-services</name>
    <value>mapreduce_shuffle</value>
  </property>
</configuration>
[hadoop@dlp ~]$
vi ~/etc/hadoop/slaves
# 全ノード名を追記 (localhost は削除)

dlp.srv.world
node01.srv.world
node02.srv.world
[6] ネームノードをフォーマットして Hadoop を起動します。
[hadoop@dlp ~]$
hdfs namenode -format

15/07/28 19:58:14 INFO namenode.NameNode: STARTUP_MSG:
/************************************************************
STARTUP_MSG: Starting NameNode
STARTUP_MSG:   host = dlp.srv.world/10.0.0.30
STARTUP_MSG:   args = [-format]
STARTUP_MSG:   version = 2.7.1
.....
.....
15/07/28 19:58:17 INFO namenode.NameNode: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NameNode at dlp.srv.world/10.0.0.30
************************************************************/

[hadoop@dlp ~]$
start-dfs.sh

Starting namenodes on [dlp.srv.world]
dlp.srv.world: starting namenode, logging to /usr/hadoop/logs/hadoop-hadoop-namenode-dlp.srv.world.out
dlp.srv.world: starting datanode, logging to /usr/hadoop/logs/hadoop-hadoop-datanode-dlp.srv.world.out
node02.srv.world: starting datanode, logging to /usr/hadoop/logs/hadoop-hadoop-datanode-node02.srv.world.out
node01.srv.world: starting datanode, logging to /usr/hadoop/logs/hadoop-hadoop-datanode-node01.srv.world.out
Starting secondary namenodes [0.0.0.0]
0.0.0.0: starting secondarynamenode, logging to /usr/hadoop/logs/hadoop-hadoop-secondarynamenode-dlp.srv.world.out

[hadoop@dlp ~]$
start-yarn.sh

starting yarn daemons
starting resourcemanager, logging to /usr/hadoop/logs/yarn-hadoop-resourcemanager-dlp.srv.world.out
dlp.srv.world: starting nodemanager, logging to /usr/hadoop/logs/yarn-hadoop-nodemanager-dlp.srv.world.out
node02.srv.world: starting nodemanager, logging to /usr/hadoop/logs/yarn-hadoop-nodemanager-node02.srv.world.out
node01.srv.world: starting nodemanager, logging to /usr/hadoop/logs/yarn-hadoop-nodemanager-node01.srv.world.out

# 確認 (以下のように起動していれば OK)

[hadoop@dlp ~]$
jps

2130 NameNode
2437 SecondaryNameNode
2598 ResourceManager
2710 NodeManager
3001 Jps
2267 DataNode
[7] 付属のサンプルプログラムを実行し、動作確認します。
# ディレクトリ作成

[hadoop@dlp ~]$
hdfs dfs -mkdir /test
# ローカルにある NOTICE.txt を /test にコピー

[hadoop@dlp ~]$
hdfs dfs -copyFromLocal ~/NOTICE.txt /test
# 内容を確認

[hadoop@dlp ~]$
hdfs dfs -cat /test/NOTICE.txt

This product includes software developed by The Apache Software
Foundation (http://www.apache.org/).

# サンプルプログラム実行

[hadoop@dlp ~]$
hadoop jar ~/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.1.jar wordcount /test/NOTICE.txt /output01

15/07/28 19:28:47 INFO client.RMProxy: Connecting to ResourceManager at dlp.srv.world/10.0.0.30:8032
15/07/28 19:28:48 INFO input.FileInputFormat: Total input paths to process : 1
15/07/28 19:28:48 INFO mapreduce.JobSubmitter: number of splits:1
.....
.....

# 実行結果を確認

[hadoop@dlp ~]$
hdfs dfs -ls /output01

Found 2 items
-rw-r--r--   2 hadoop supergroup      0 2015-07-29 14:29 /output01/_SUCCESS
-rw-r--r--   2 hadoop supergroup    123 2015-07-29 14:29 /output01/part-r-00000

# 結果の内容を確認 (各ワードのカウント数が結果として出力されている)

[hadoop@dlp ~]$
hdfs dfs -cat /output01/part-r-00000

(http://www.apache.org/).       1
Apache          1
Foundation      1
Software        1
The             1
This            1
by              1
developed       1
includes        1
product         1
software        1
[8] 「http://(サーバーのホスト名またはIPアドレス):50070/」にアクセスすると、概要が確認できます。
[9] 「http://(サーバーのホスト名またはIPアドレス):8088/」にアクセスすると、クラスターの情報が確認できます。
 
Tweet