CentOS Linux下Hadoop2.2.0集群的搭建

一、安装环境
操作系统:CentOS Linux 6.5 (64位)
hadoop1:192.168.1.34
hadoop2:192.168.1.35

1.1、删除CentOS Linux默认安装的JDK

rpm -qa | grep java

如果是basic server安装的系统,应该会出现如下rpm包:

tzdata-java-2013g-1.el6.noarch
java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64
java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.x86_64

删除openjdk包:

yum remove java-1.7.0-openjdk-1.7.0.45-2.4.3.3.el6.x86_64
yum remove java-1.6.0-openjdk-1.6.0.0-1.66.1.13.0.el6.x86_64
rpm -qa | grep jdk

如果是basic server安装的系统,应该会出现如下rpm包:

jdk-1.7.0_45-fcs.x86_64

删除jdk包:

yum remove jdk-1.7.0_45-fcs.x86_64

1.2、关闭防火墙(两台服务器都要操作)

/etc/init.d/iptables stop
/etc/init.d/ip6tables stop
chkconfig ip6tables off
chkconfig iptables off

1.3、关闭SElinux(两台服务器都要操作)

vi /etc/selinux/config

将SELINUX=enforcing改为SELINUX=disabled

1.4、设置主机名(两台服务器都要操作)

vi /etc/sysconfig/network

1.5、添加hosts记录(两台服务器都要操作)

vi /etc/hosts

全部设为如下内容:

#127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
#::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
192.168.1.34 hadoop1
192.168.1.35 hadoop2

1.6、创建hadoop用户(两台服务器都要操作)

useradd hadoop

1.7、建立hadoop用户之间相互信任(互相无密码登陆,两台服务器都要操作)
1.7.1、在hadoop1上的操作

su - hadoop
ssh-keygen -t rsa
cd .ssh
cp id_rsa.pub authorized_keys 
chmod 600 authorized_keys  
cd .. 

建立本身的信任连接

ssh hadoop1

按提示输入yes(三个字母要打全)
1.7.2、在hadoop2上的操作

su - hadoop
ssh-keygen -t rsa
cd .ssh
cp id_rsa.pub authorized_keys 
chmod 600 authorized_keys  
cd .. 

建立本身的信任连接

ssh hadoop2

按提示输入yes(三个字母要打全)
1.7.3、互相拷贝公钥
hadoop1

cd .ssh/
cat id_rsa.pub

将里面的内容拷贝到hadoop2的authorized_keys中

hadoop2

cd .ssh/
cat id_rsa.pub

将里面的内容拷贝到hadoop1的authorized_keys中
1.7.4、在两台机器之上以hadoop用户身份运行:

ssh hadoop1
ssh hadoop2

这样就建立了两台机器之间的互信连接
1.8、安装JAVA(两台服务器都要操作)
CentOS下的安装可参考《CentOS 6.4下安装Java
1.9、安装需要用到的依赖包(只需要在hadoop1操作)

yum install  lzo-devel  zlib-devel  gcc autoconf automake libtool   ncurses-devel openssl-devel cmake

1.10、安装Maven(只需要在hadoop1操作)
CentOS下的安装可参考《CentOS 6.4下安装Maven》
1.11、安装protobuf(不安装,编译将无法完成,只需要在hadoop1操作)

cd /data0/software
wget http://protobuf.googlecode.com/files/protobuf-2.5.0.tar.gz
tar zxvf protobuf-2.5.0.tar.gz
chown root.root -R protobuf-2.5.0
cd protobuf-2.5.0
./configure
make
make install
/usr/local/bin/protoc --version
cd ..

默认情况下,编译器只会使用/lib和/usr/lib这两个目录下的库文件,因此要手动添加相关路径到ld.so.conf中。

vi /etc/ld.so.conf

加入如下内容

/usr/local/bin

保存退出后,执行下面的命令:

ldconfig

退出重新登录,执行

protoc --version

如能显示版本号即表示安装完成。
二、编译安装hadoop2.2.0
由于hadoop官网上编译好的包中的库文件时32位的,因此需要重新编译才能正常在64位系统上使用。

cd /data0/software
wget http://apache.fayea.com/apache-mirror/hadoop/common/hadoop-2.2.0/hadoop-2.2.0-src.tar.gz
tar zxvf hadoop-2.2.0-src.tar.gz
cd hadoop-2.2.0-src

打补丁,目前的2.2.0 的Source Code 压缩包解压出来的code有个bug 需要patch后才能编译。

vi hadoop-common-project/hadoop-auth/pom.xml

找到org.mortbay.jetty,在它和
jetty之间加入下面的内容

      <artifactId>jetty-util</artifactId>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.mortbay.jetty</groupId>

修改之后如下:

     <dependency>
       <groupId>org.mortbay.jetty</groupId>
      <artifactId>jetty-util</artifactId>
      <scope>test</scope>
    </dependency>
    <dependency>
      <groupId>org.mortbay.jetty</groupId>
       <artifactId>jetty</artifactId>
       <scope>test</scope>
     </dependency>

开始编译,此步骤要求主机能访问公网,并且此步骤很漫长。视服务器配置和网速而定。

mvn clean package -Pdist,native -DskipTests -Dtar

编译成功后,/data0/software/hadoop-2.2.0-src/hadoop-dist/target/hadoop-2.2.0.tar.gz就是我们需要的文件了。将编译好的hadoop-2.2.0.tar.gz文件分别拷贝至其他节点,并将其解压,设置其属主,属组都为hadoop用户)
hadoop1上操作步骤如下:

cd /data0/software/hadoop-2.2.0-src/hadoop-dist/target/
tar zxvf hadoop-2.2.0.tar.gz
mv hadoop-2.2.0 /usr/local/
chown hadoop.hadoop -R /usr/local/hadoop-2.2.0

 
添加环境变量,

vi /etc/profile

在最后加入如下内容

export HADOOP_HOME=/usr/local/hadoop-2.2.0
export PATH=$PATH:$HADOOP_HOME/bin

然后执行:

source /etc/profile

配置hadoop:

vi /usr/local/hadoop-2.2.0/etc/hadoop/slaves

输入

192.168.1.35
vi /usr/local/hadoop-2.2.0/etc/hadoop/core-site.xml

文件内容如下:

<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
                <name>fs.defaultFS</name>
                <value>hdfs://192.168.1.34:9000</value>
        </property>

        <property>
                <name>io.file.buffer.size</name>
                <value>131072</value>
        </property>


        <property>
                <name>hadoop.tmp.dir</name>
                <value>file:/usr/local/hadoop-2.2.0/temp</value>
                <description>Abase for other temporary directories.</description>
        </property>

        <property>
                <name>hadoop.proxyuser.hduser.hosts</name>
                <value>*</value>
        </property>

        <property>
                <name>hadoop.proxyuser.hduser.groups</name>
                <value>*</value>
        </property>
</configuration>

vi /usr/local/hadoop-2.2.0/etc/hadoop/hdfs-site.xml

文件内容如下:

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
                <name>dfs.namenode.secondary.http-address</name>
                <value>192.168.1.34:9001</value>
        </property>
        <property>
                <name>dfs.namenode.name.dir</name>
                <value>file:/usr/local/hadoop-2.2.0/dfs/name</value>
        </property>

        <property>
                <name>dfs.datanode.data.dir</name>
                <value>file:/usr/local/hadoop-2.2.0/dfs/data</value>
        </property>

        <property>
                <name>dfs.replication</name>
                <value>1</value>
        </property>
        <property>
                <name>dfs.webhdfs.enabled</name>
                <value>true</value>
        </property>
</configuration>

vi /usr/local/hadoop-2.2.0/etc/hadoop/mapred-site.xml

文件内容如下:

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->

<!-- Put site-specific property overrides in this file. -->

<configuration>
        <property>
                <name>mapreduce.framework.name</name>
                <value>yarn</value>
        </property>
        <property>
                <name>mapreduce.jobhistory.address</name>
                <value>192.168.1.34:10020</value>
        </property>
                <property><name>mapreduce.jobhistory.webapp.address</name>
                <value>192.168.1.34:19888</value>
        </property>

</configuration>

vi /usr/local/hadoop-2.2.0/etc/hadoop/yarn-site.xml

文件内容如下:

<?xml version="1.0"?>
<!--
  Licensed under the Apache License, Version 2.0 (the "License");
  you may not use this file except in compliance with the License.
  You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

  Unless required by applicable law or agreed to in writing, software
  distributed under the License is distributed on an "AS IS" BASIS,
  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
  See the License for the specific language governing permissions and
  limitations under the License. See accompanying LICENSE file.
-->
<configuration>

<!-- Site specific YARN configuration properties -->
        <property>
                <name>yarn.nodemanager.aux-services</name>
                <value>mapreduce_shuffle</value>
        </property>
        <property>
                <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
                <value>org.apache.hadoop.mapred.ShuffleHandler</value>
        </property>
        <property>
                <name>yarn.resourcemanager.address</name>
                <value>192.168.1.34:8032</value>
        </property>

        <property>
                <name>yarn.resourcemanager.scheduler.address</name>
                <value>192.168.1.34:8030</value>
        </property>
        <property>
                <name>yarn.resourcemanager.resource-tracker.address</name>
                <value>192.168.1.34:8031</value>
        </property>
        <property>
                <name>yarn.resourcemanager.admin.address</name>
                <value>192.168.1.34:8033</value>
        </property>
        <property>
                <name>yarn.resourcemanager.webapp.address</name>
                <value>192.168.1.34:8088</value>
        </property>
</configuration>

vi /usr/local/hadoop-2.2.0/etc/hadoop/hadoop-env.sh

修改export JAVA_HOME=/usr/java/jsdk

vi /usr/local/hadoop-2.2.0/etc/hadoop/yarn-env.sh

修改JAVA_HOME=/usr/java/jsdk

拷贝hadoop目录到hadoop2

scp -r /usr/local/hadoop-2.2.0 root@hadoop2:/usr/local

登陆hadoop2,做如下操作步骤

chown hadoop.hadoop -R /usr/local/hadoop-2.2.0

添加环境变量,

vi /etc/profile

在最后加入如下内容

export HADOOP_HOME=/usr/local/hadoop-2.2.0
export PATH=$PATH:$HADOOP_HOME/bin

然后执行:

source /etc/profile

三、启动hadoop
在hadoop1上初始化hadoop

su - hadoop
/usr/local/hadoop-2.2.0/bin/hdfs namenode -format

启动hadoop

/usr/local/hadoop-2.2.0/sbin/start-dfs.sh
/usr/local/hadoop-2.2.0/sbin/start-yarn.sh

执行完之后,hadoop1运行jps会看到
ResourceManager
SecondaryNameNode
NameNode
hadoop2运行jps会看到
DataNode
NodeManager

访问网页 :http://192.168.1.34:8088
访问网页 :http://192.168.1.34:50070

除非注明,本博客文章均为原创,转载请以链接形式标明本文地址

本文地址: http://blog.cnwyhx.com/?p=313

Leave a Reply