Ubuntu下Hadoop单机部署及分布式集群部署

CDH4.5手动安装

http://wenku.baidu.com/view/6544c87f2e3f5727a5e962a3.html

hadoop-1.1.2.tar.gz也测试通过

重要

安装文档http://wenku.baidu.com/view/685f71b165ce050876321329.html

在选择网络连接时,选择桥接模式

设置root用户密码

打开终端ctrl+Alt+T

修改root密码sudopasswdroot

输入密码

用户root用户登录suroot

ubuntu8.10默认没有安装ssh服务,需要手动安装以后才能实现

sudoapt-getinstallssh

或sudoapt-getinstallopenssh-server//安装openssh-server

用ifconfig查看ip地址

远程用crt连接

ubuntu10.2.128.46

ubuntu110.2.128.20

ubuntu210.2.128.120

安装vim

sudoapt-getinstallvim

1、安装JDK

1.1、到官网下载相关的JDK

这里下载的是jdk-6u23-linux-i586.bin。

下载地址:http://www.oracle.com/technetwork/java/javase/downloads/index.html

找jdk6

放置在/home/qiaowang

sudoshjdk-6u23-linux-i586.bin

cp-rfjdk1.6.0_33//usr/lib/

sudogedit/etc/environment

exportJAVA_HOME=/usr/lib/jdk1.6.0_33

exportJRE_HOME=/usr/lib/jdk1.6.0_33/jre

exportCLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

vim/etc/profile

exportJAVA_HOME=/usr/lib/jdk1.6.0_33

exportJRE_HOME=/usr/lib/jdk1.6.0_33/jre

exportCLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

exportPATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$JAVA_HOME/bin

加在umak前即可

source/etc/profile

reboot

root@qiaowang-virtual-machine:/etc#java-version

javaversion"1.6.0_33"

Java(TM)SERuntimeEnvironment(build1.6.0_33-b03)

JavaHotSpot(TM)ClientVM(build20.8-b03,mixedmode)

JDK环境的操作需要在所有的namenode和datanode上面进行操作。

2、增加一个用户组用户,用于hadoop运行及访问。

sudoaddgrouphadoop

sudoadduser--ingrouphadoophadoop

查看用户所属组:

id用户名

查看组内用户:

groups用户名

查看所有用户:

cat/etc/shadow

删除用户

在root用户下:userdel-rnewuser

在普通用户下:sudouserdel-rnewuser

先退出再删除

3、生成SSH证书,配置SSH加密key

su-hadoop//切换到hadoop用户

ssh-keygen-trsa-P""//生成sshkey

cd.ssh/

cat$HOME/.ssh/id_rsa.pub>>$HOME/.ssh/authorized_keys//设置允许ssh访问

cat/home/hadoop/.ssh/id_rsa.pub>>/home/hadoop/.ssh/authorized_keys

设置完成后通过sshlocalhost测试一下。

把#去掉即可,系统就能通过authorized_keys来识别公钥了

4、下载hadoop发行版,地址:

http://hadoop.apache.org/common/releases.html#Download

http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-0.20.2/

最新版本

hadoop-2.0.0-cdh4.5.0.tar.gz

已拷贝到opt

tar-zxvfhadoop-0.20.2.tar.gz

tar-zxvfhadoop-2.0.0-cdh4.5.0.tar.gz

5、修改主机名qiaowang-virtual-machine

root@qiaowang-virtual-machine:/opt#hostname

qiaowang-virtual-machine

假定我们发现我们的机器的主机名不是我们想要的,通过对"/etc/sysconfig/network"文件修改其中"HOSTNAME"后面的值,改成我们规划的名称。

vim/etc/hostname

Master.Hadoop

执行

hostnamem1hadoop.focus.cn

root@Master:~#hostname

Master.Hadoop

vim/etc/hosts

127.0.1.1Master.Hadoop

后面的配置

参考

http://wenku.baidu.com/view/6544c87f2e3f5727a5e962a3.html

1、core-site.xml

<property>

<name>fs.default.name</name>

<value>hdfs://m1hadoop.xingmeng.com:8020</value>

<final>true</final>

</property>

<property>

<name>hadoop.tmp.dir</name>

<value>/home/hadoop/tempdata</value>

</property>

2、yarn-site.xml

<property>

<name>yarn.resourcemanager.address</name>

<value>m1hadoop.xingmeng.com:8032</value>

</property>

<property>

<name>yarn.resourcemanager.scheduler.address</name>

<value>m1hadoop.xingmeng.com:8030</value>

</property>

<property>

<name>yarn.resourcemanager.resource-tracker.address</name>

<value>m1hadoop.xingmeng.com:8031</value>

</property>

<property>

<name>yarn.resourcemanager.admin.address</name>

<value>m1hadoop.xingmeng.com:8033</value>

</property>

<property>

<name>yarn.resourcemanager.webapp.address</name>

<value>m1hadoop.xingmeng.com:8088</value>

</property>

<property>

<name>yarn.nodemanager.aux-services</name>

<value>mapreduce.shuffle</value>

</property>

<property>

<name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>

<value>org.apache.hadoop.mapred.ShuffleHandler</value>

</property>

3、mapred-site.xml

<property>

<name>mapreduce.framework.name</name>

<value>yarn</value>

</property>

<property>

<name>mapred.system.dir</name>

<value>file:/home/hadoop/mapred_system</value>

<final>true</final>

</property>

<property>

<name>mapred.local.dir</name>

<value>file:/home/hadoop/mapred_local</value>

<final>true</final>

</property>

4、hdfs-site.xml

<property>

<name>dfs.namenode.name.dir</name>

<value>/home/hadoop/name</value>

</property>

<property>

<name>dfs.datanode.data.dir</name>

<value>/home/hadoop/data</value>

</property>

<property>

<name>dfs.replication</name>

<value>2</value>

</property>

<property>

<name>dfs.permissions</name>

<value>false</value>

</property>

./hdfsnamenode-format

/home/hadoop/cdh/sbin/

/home/hadoop/cdh/bin/hadoopfs-ls/user/hadoop

如果ip有变化请先修改/etc/host

datanode没起来

删除/home/hadoop/data/下的文件

--------------------------------------------------

4.关掉ipv6

修改hadoop根目录下conf/hadoop-env.sh文件(还没下载hadoop的下载解压先~)

exportHADOOP_OPTS=-Djava.net.preferIPv4Stack=true

cat/proc/sys/net/ipv6/conf/all/disable_ipv6

为0

备选情况:为1是成功,应使用以下方式

net.ipv6.conf.all.disable_ipv6=1

net.ipv6.conf.default.disable_ipv6=1

net.ipv6.conf.lo.disable_ipv6=1、

5、将hadoop目录所有者更改为hadoop

chown-Rhadoop:hadoop/opt/hadoop-0.20.2/

mvhadoop-0.20.2hadoop

6.安装hadoop

下面说说如何配置和启动:

基本思路

a、配置JDK

b配置core-site.xml

cmapred-site.xml

dhdfs-site.xml

创建存放数据的目录

mkdir/opt/hadoop-datastore

打开conf/core-site.xml,配置如下

<configuration>

<property>

<name>hadoop.tmp.dir</name>

<value>/opt/hadoop-datastore/</value>

<description>Abaseforothertemporarydirectories.</description>

</property>

<property>

<name>fs.default.name</name>

<value>hdfs://localhost:54310</value>

<description>Thenameofthedefaultfilesystem.AURIwhose

schemeandauthoritydeterminetheFileSystemimplementation.The

uri'sschemedeterminestheconfigproperty(fs.SCHEME.impl)naming

theFileSystemimplementationclass.Theuri'sauthorityisusedto

determinethehost,port,etc.forafilesystem.</description>

</property>

</configuration>

mapred-site.xml如下:

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>localhost:54311</value>

<description>ThehostandportthattheMapReducejobtrackerruns

at.If"local",thenjobsarerunin-processasasinglemap

andreducetask.

</description>

</property>

</configuration>

hdfs-site.xml如下:

<configuration>

<property>

<name>dfs.replication</name>

<value>1</value>

<description>Defaultblockreplication.

Theactualnumberofreplicationscanbespecifiedwhenthefileiscreated.

Thedefaultisusedifreplicationisnotspecifiedincreatetime.

</description>

</property>

</configuration>

vimhadoop-env.sh

exportJAVA_HOME=/usr/lib/jdk1.6.0_33

ok,配置完毕

格式化HDFS:

/opt/hadoop/bin/hadoopnamenode-format

输出

root@Master:/opt/hadoop#/opt/hadoop/bin/hadoopnamenode-format

12/07/1314:27:29INFOnamenode.NameNode:STARTUP_MSG:

/************************************************************

STARTUP_MSG:StartingNameNode

STARTUP_MSG:host=Master.Hadoop/127.0.1.1

STARTUP_MSG:args=[-format]

STARTUP_MSG:version=0.20.2

STARTUP_MSG:build=https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-r911707;compiledby'chrisdo'onFriFeb1908:07:34UTC2010

************************************************************/

Re-formatfilesystemin/opt/hadoop-datastore/dfs/name?(YorN)y

Formatabortedin/opt/hadoop-datastore/dfs/name

12/07/1314:27:35INFOnamenode.NameNode:SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG:ShuttingdownNameNodeatMaster.Hadoop/127.0.1.1

************************************************************/

启动HDFS和MapReduce

改为hadoop用户

/opt/hadoop/bin/start-all.sh

输出

startingnamenode,loggingto/opt/hadoop/bin/../logs/hadoop-root-namenode-Master.Hadoop.out

Theauthenticityofhost'localhost(127.0.0.1)'can'tbeestablished.

ECDSAkeyfingerprintis3e:55:d8:be:47:46:21:95:29:9b:9e:c5:fb:02:f4:d2.

Areyousureyouwanttocontinueconnecting(yes/no)?yes

localhost:Warning:Permanentlyadded'localhost'(ECDSA)tothelistofknownhosts.

root@localhost'spassword:

localhost:startingdatanode,loggingto/opt/hadoop/bin/../logs/hadoop-root-datanode-Master.Hadoop.out

root@localhost'spassword:

localhost:startingsecondarynamenode,loggingto/opt/hadoop/bin/../logs/hadoop-root-secondarynamenode-Master.Hadoop.out

startingjobtracker,loggingto/opt/hadoop/bin/../logs/hadoop-root-jobtracker-Master.Hadoop.out

root@localhost'spassword:

localhost:startingtasktracker,loggingto/opt/hadoop/bin/../logs/hadoop-root-tasktracker-Master.Hadoop.out

9、停止服务的脚本是:

/opt/hadoop/bin/stop-all.sh

10.启动成功后,用jps查看下。

2914NameNode

3197JobTracker

3896Jps

3024DataNode

3126SecondaryNameNode

3304TaskTracker

5.运行wordcount.java

在Hadoop所在目录里有几个jar文件,其中hadoop-examples-0.20.203.0.jar就是我们需要的,它里面含有wordcount,咱们使用命令建立测试的文件

(1)先在本地磁盘建立两个输入文件file01和file02:

$echo“HelloWorldByeWorld”>file01

$echo“HelloHadoopGoodbyeHadoop”>file02

./hadoopfs-ls/

(2)在hdfs中建立一个input目录:./hadoopfs-mkdirinput

删除./hadoopdfs-rmrinput

(3)将file01和file02拷贝到hdfs中:

./hadoopfs-copyFromLocal/home/qiaowang/file0*input

./hadoopfs-ls/user/root/input

Found2items

-rw-r--r--1rootsupergroup222012-07-1315:07/user/root/input/file01

-rw-r--r--1rootsupergroup282012-07-1315:07/user/root/input/file02

root@Master:/opt/hadoop/bin#./hadoopfs-cat/user/root/input/file01/

HelloWorldByeWorld

(4)执行wordcount:

$Hadoopjarhadoop-0.20.1-examples.jarwordcountinputoutput

$bin/hadoopjarhadoop-0.20.1-examples.jarwordcountinputoutput

Exceptioninthread"m

ain"java.io.IOException:Erroropeningjobjar:hadoop-0.20.2-examples.jar

atorg.apache.hadoop.util.RunJar.main(RunJar.java:90)

Causedby:java.util.zip.ZipException:errorinopeningzipfile

atjava.util.zip.ZipFile.open(NativeMethod)

atjava.util.zip.ZipFile.<init>(ZipFile.java:114)

atjava.util.jar.JarFile.<init>(JarFile.java:135)

atjava.util.jar.JarFile.<init>(JarFile.java:72)

atorg.apache.hadoop.util.RunJar.main(RunJar.java:88)

解决办法:

注意路径:

./hadoopjar/opt/hadoop/hadoop-0.20.2-examples.jarwordcountinputoutput

输出

12/07/1315:20:22INFOinput.FileInputFormat:Totalinputpathstoprocess:2

12/07/1315:20:22INFOmapred.JobClient:Runningjob:job_201207131429_0001

12/07/1315:20:23INFOmapred.JobClient:map0%reduce0%

12/07/1315:20:32INFOmapred.JobClient:map100%reduce0%

12/07/1315:20:44INFOmapred.JobClient:map100%reduce100%

12/07/1315:20:46INFOmapred.JobClient:Jobcomplete:job_201207131429_0001

12/07/1315:20:46INFOmapred.JobClient:Counters:17

12/07/1315:20:46INFOmapred.JobClient:JobCounters

12/07/1315:20:46INFOmapred.JobClient:Launchedreducetasks=1

12/07/1315:20:46INFOmapred.JobClient:Launchedmaptasks=2

12/07/1315:20:46INFOmapred.JobClient:Data-localmaptasks=2

12/07/1315:20:46INFOmapred.JobClient:FileSystemCounters

12/07/1315:20:46INFOmapred.JobClient:FILE_BYTES_READ=79

12/07/1315:20:46INFOmapred.JobClient:HDFS_BYTES_READ=50

12/07/1315:20:46INFOmapred.JobClient:FILE_BYTES_WRITTEN=228

12/07/1315:20:46INFOmapred.JobClient:HDFS_BYTES_WRITTEN=41

12/07/1315:20:46INFOmapred.JobClient:Map-ReduceFramework

12/07/1315:20:46INFOmapred.JobClient:Reduceinputgroups=5

12/07/1315:20:46INFOmapred.JobClient:Combineoutputrecords=6

12/07/1315:20:46INFOmapred.JobClient:Mapinputrecords=2

12/07/1315:20:46INFOmapred.JobClient:Reduceshufflebytes=45

12/07/1315:20:46INFOmapred.JobClient:Reduceoutputrecords=5

12/07/1315:20:46INFOmapred.JobClient:SpilledRecords=12

12/07/1315:20:46INFOmapred.JobClient:Mapoutputbytes=82

12/07/1315:20:46INFOmapred.JobClient:Combineinputrecords=8

12/07/1315:20:46INFOmapred.JobClient:Mapoutputrecords=8

12/07/1315:20:46INFOmapred.JobClient:Reduceinputrecords=6

(5)完成之后,查看结果:

root@Master:/opt/hadoop/bin#./hadoopfs-cat/user/root/output/part-r-00000

Bye1

GoodBye1

Hadoop2

Hello2

World2

root@Master:/opt/hadoop/bin#jps

3049TaskTracker

2582DataNode

2849JobTracker

10386Jps

2361NameNode

2785SecondaryNameNode

OK以上部分,已完成了ubuntu下单机hadoop的搭建。

--------------------------------------------------------

下面我们进行集群的搭建(3台ubuntu服务器)

参考

http://www.linuxidc.com/Linux/2011-04/35162.htm

http://www.2cto.com/os/201202/118992.html

1、三台机器:已安装jdk,添加hadoop用户

ubuntu10.2.128.46master

ubuntu110.2.128.20slave1

ubuntu210.2.128.120slave2

修改三台机器所有的/etc/hosts文件如下:

127.0.0.1localhost

10.2.128.46master.Hadoop

10.2.128.20slave1.Hadoop

10.2.128.120slave2.Hadoop

以下操作均在Hadoop用户下操作

2、生成SSH证书,配置SSH加密key

su-hadoop//切换到hadoop用户

ssh-keygen-trsa-P""//生成sshkey

cat$HOME/.ssh/id_rsa.pub>>$HOME/.ssh/authorized_keys//设置允许ssh访问

在namenode(Master)上

hadoop@Master:~/.ssh$scpauthorized_keysSlave1.Hadoop:/home/hadoop/.ssh/

hadoop@Master:~/.ssh$scpauthorized_keysSlave2.Hadoop:/home/hadoop/.ssh/

测试:sshnode2或者sshnode3(第一次需要输入yes)。

如果不须要输入密码则配置成功,如果还须要请检查上面的配置能不能正确。

hadoop@Master:~/.ssh$sshSlave1.Hadoop

WelcometoUbuntuprecise(developmentbranch)

hadoop@Master:~/.ssh$sshSlave2.Hadoop

WelcometoUbuntuprecise(developmentbranch)

2、hadoop-0.20.2.tar.gz拷贝到/home/qiaowang/install_Hadoop目录下

可采用的方法

1)安装Hadoop集群通常要将安装软件解压到集群内的所有机器上。并且安装路径要一致,如果我们用HADOOP_HOME指代安装的根路径,通常,集群里的所有机器的

HADOOP_HOME路径相同。

2)如果集群内机器的环境完全一样,可以在一台机器上配置好,然后把配置好的软件即hadoop-0.20.203整个文件夹拷贝到其他机器的相同位置即可。

3)可以将Master上的Hadoop通过scp拷贝到每一个Slave相同的目录下,同时根据每一个Slave的Java_HOME的不同修改其hadoop-env.sh。

3,相关配置

4)为了方便,使用hadoop命令或者start-all.sh等命令,修改Master上/etc/profile新增以下内容:

exportJAVA_HOME=/usr/lib/jdk1.6.0_33

exportJRE_HOME=/usr/lib/jdk1.6.0_33/jre

exportCLASSPATH=$CLASSPATH:$JAVA_HOME/lib:$JAVA_HOME/jre/lib

exportHADOOP_HOME=/opt/hadoop

exportPATH=$JAVA_HOME/bin:$JAVA_HOME/jre/bin:$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin

修改完毕后,执行source/etc/profile来使其生效。

配置conf下的文件:

vimhadoop-env.sh

exportJAVA_HOME=/usr/lib/jdk1.6.0_33

vimcore-site.xml

----------------------------------

<configuration>

<property>

<name>hadoop.tmp.dir</name>

<value>/opt/hadoop-datastore/</value>

<description>Abaseforothertemporarydirectories.</description>

</property>

<property>

<name>fs.default.name</name>

<value>hdfs://Master.Hadoop:54310</value>

<description>Thenameofthedefaultfilesystem.AURIwhose

schemeandauthoritydeterminetheFileSystemimplementation.The

uri'sschemedeterminestheconfigproperty(fs.SCHEME.impl)naming

theFileSystemimplementationclass.Theuri'sauthorityisusedto

determinethehost,port,etc.forafilesystem.</description>

</property>

</configuration>

-----------------------------------------

vimhdfs-site.xml

------------------------------------------

<configuration>

<property>

<name>dfs.replication</name>

<value>3</value>

<description>Defaultblockreplication.

Theactualnumberofreplicationscanbespecifiedwhenthefileiscreated.

Thedefaultisusedifreplicationisnotspecifiedincreatetime.

</description>

</property>

</configuration>

-------------------------------------

vimmapred-site.xml

------------------------------------

<configuration>

<property>

<name>mapred.job.tracker</name>

<value>Master.Hadoop:54311</value>

<description>ThehostandportthattheMapReducejobtrackerruns

at.If"local",thenjobsarerunin-processasasinglemap

andreducetask.

</description>

</property>

</configuration>

-------------------------------------

vimmasters

Master.Hadoop

root@Master:/opt/hadoop/conf#vimslaves

Slave1.Hadoop

Slave2.Hadoop

采用方法3,将Master上的Hadoop拷贝到每个Slave下

切换为root用户

suroot

执行scp-rhadoopSlave1.Hadoop:/opt/

在Slave1.Hadoop上

suroot

chown-Rhadoop:hadoop/opt/hadoop/

创建目录

mkdir/opt/hadoop-datastore/

chown-Rhadoop:hadoop/opt/hadoop-datastore/

同理其他Slave

在namenode执行格式化hadoop

root@Master:/opt/hadoop/bin#hadoopnamenode-format

输出:

12/07/2318:54:36INFOnamenode.NameNode:STARTUP_MSG:

/************************************************************

STARTUP_MSG:StartingNameNode

STARTUP_MSG:host=Master.Hadoop/10.2.128.46

STARTUP_MSG:args=[-format]

STARTUP_MSG:version=0.20.2

STARTUP_MSG:build=https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-r911707;compiledby'chrisdo'onFriFeb1908:07:34UTC2010

************************************************************/

Re-formatfilesystemin/opt/hadoop-datastore/dfs/name?(YorN)y

Formatabortedin/opt/hadoop-datastore/dfs/name

12/07/2318:54:45INFOnamenode.NameNode:SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG:ShuttingdownNameNodeatMaster.Hadoop/10.2.128.46

************************************************************/

启动hadoop

./start-all.sh

root@Master:/opt#chown-Rhadoop:hadoop/opt/hadoop/

root@Master:/opt#chown-Rhadoop:hadoop/opt/hadoop-datastore/

root@Master:/opt#suhadoop

hadoop@Master:/opt$cdhadoop/bin/

hadoop@Master:/opt/hadoop/bin$./start-all.sh

遇到的问题:

startingnamenode,loggingto/opt/hadoop/bin/../logs/hadoop-hadoop-namenode-Master.Hadoop.out

Slave1.Hadoop:datanoderunningasprocess7309.Stopitfirst.

Slave2.Hadoop:datanoderunningasprocess4920.Stopitfirst.

Master.Hadoop:startingsecondarynamenode,loggingto/opt/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-Master.Hadoop.out

startingjobtracker,loggingto/opt/hadoop/bin/../logs/hadoop-hadoop-jobtracker-Master.Hadoop.out

Slave1.Hadoop:tasktrackerrunningasprocess7477.Stopitfirst.

Slave2.Hadoop:tasktrackerrunningasprocess5088.Stopitfirst.

网上参考:

可能是楼主启动集群后,又重复格式化namenode导致的。

如果只是测试学习,可以使用如下解决方法:

1、首先kill掉26755、21863和26654几个进程。如果kill26755不行,可以kill-kill26755。

2、手动删除conf/hdfs-site.xml文件中配置的dfs.data.dir目录下的内容。

3、执行$HADOOP_HOME/bin/hadoopnamenode-format

4、启动集群$HADOOP_HOME/bin/start-all.sh

后果:

HDFS中内容会全部丢失。

解决方案:重新进行了格式化

suhadoop

hadoop@Master:/opt/hadoop/bin$./hadoopnamenode-format

12/07/2410:43:29INFOnamenode.NameNode:STARTUP_MSG:

/************************************************************

STARTUP_MSG:StartingNameNode

STARTUP_MSG:host=Master.Hadoop/10.2.128.46

STARTUP_MSG:args=[-format]

STARTUP_MSG:version=0.20.2

STARTUP_MSG:build=https://svn.apache.org/repos/asf/hadoop/common/branches/branch-0.20-r911707;compiledby'chrisdo'onFriFeb1908:07:34UTC2010

************************************************************/

Re-formatfilesystemin/opt/hadoop-datastore/dfs/name?(YorN)y

Formatabortedin/opt/hadoop-datastore/dfs/name

12/07/2410:43:32INFOnamenode.NameNode:SHUTDOWN_MSG:

/************************************************************

SHUTDOWN_MSG:ShuttingdownNameNodeatMaster.Hadoop/10.2.128.46

************************************************************/

hadoop@Master:/opt/hadoop/bin$./start-all.sh

startingnamenode,loggingto/opt/hadoop/bin/../logs/hadoop-hadoop-namenode-Master.Hadoop.out

Slave1.Hadoop:startingdatanode,loggingto/opt/hadoop/bin/../logs/hadoop-hadoop-datanode-Slave1.Hadoop.out

Slave2.Hadoop:startingdatanode,loggingto/opt/hadoop/bin/../logs/hadoop-hadoop-datanode-Slave2.Hadoop.out

Master.Hadoop:startingsecondarynamenode,loggingto/opt/hadoop/bin/../logs/hadoop-hadoop-secondarynamenode-Master.Hadoop.out

startingjobtracker,loggingto/opt/hadoop/bin/../logs/hadoop-hadoop-jobtracker-Master.Hadoop.out

Slave2.Hadoop:startingtasktracker,loggingto/opt/hadoop/bin/../logs/hadoop-hadoop-tasktracker-Slave2.Hadoop.out

Slave1.Hadoop:startingtasktracker,loggingto/opt/hadoop/bin/../logs/hadoop-hadoop-tasktracker-Slave1.Hadoop.out

hadoop@Master:/opt/hadoop/bin$

--------------------------------------------------------------------------------------------------------------------------

以下为验证部分:

hadoop@Master:/opt/hadoop/bin$./hadoopdfsadmin-report

SafemodeisON

ConfiguredCapacity:41137831936(38.31GB)

PresentCapacity:31127531520(28.99GB)

DFSRemaining:31127482368(28.99GB)

DFSUsed:49152(48KB)

DFSUsed%:0%

Underreplicatedblocks:0

Blockswithcorruptreplicas:0

Missingblocks:0

-------------------------------------------------

Datanodesavailable:2(2total,0dead)

Name:10.2.128.120:50010

DecommissionStatus:Normal

ConfiguredCapacity:20568915968(19.16GB)

DFSUsed:24576(24KB)

NonDFSUsed:4913000448(4.58GB)

DFSRemaining:15655890944(14.58GB)

DFSUsed%:0%

DFSRemaining%:76.11%

Lastcontact:TueJul2410:50:43CST2012

Name:10.2.128.20:50010

DecommissionStatus:Normal

ConfiguredCapacity:20568915968(19.16GB)

DFSUsed:24576(24KB)

NonDFSUsed:5097299968(4.75GB)

DFSRemaining:15471591424(14.41GB)

DFSUsed%:0%

DFSRemaining%:75.22%

Lastcontact:TueJul2410:50:41CST2012

web查看方式:http://10.2.128.46:50070/

查看job信息

http://10.2.128.46:50030/jobtracker.jsp

要想检查守护进程是否正在运行,可以使用jps命令(这是用于JVM进程的ps实用程序)。这个命令列出5个守护进程及其进程标识符。

hadoop@Master:/opt/hadoop/conf$jps

2823Jps

2508JobTracker

2221NameNode

2455SecondaryNameNode

netstat-nat

tcp0010.2.128.46:543110.0.0.0:*LISTEN

tcp0010.2.128.46:5431010.2.128.46:44150ESTABLISHED

tcp267010.2.128.46:5431110.2.128.120:48958ESTABLISHED

tcp0010.2.128.46:5431010.2.128.20:41230ESTABLISHED

./hadoopdfs-ls/

hadoop@Master:/opt/hadoop/bin$./hadoopdfs-ls/

Found2items

drwxr-xr-x-rootsupergroup02012-07-1315:20/opt

drwxr-xr-x-rootsupergroup02012-07-1315:20/user

hadoop@Master:/opt/hadoop$bin/hadoopfs-mkdirinput

遇到问题:

mkdir:org.apache.hadoop.hdfs.server.namenode.SafeModeException:Cannotcreatedirectory/user/hadoop/input.Namenodeisinsafemode.

那什么是Hadoop的安全模式呢?

在分布式文件系统启动的时候,开始的时候会有安全模式,当分布式文件系统处于安全模式的情况下,文件系统中的内容不允许修改也不允许删除,直到安全模式结束。

安全模式主要是为了系统启动的时候检查各个DataNode上数据块的有效性,同时根据策略必要的复制或者删除部分数据块。

运行期通过命令也可以进入安全模式。在实践过程中,系统启动的时候去修改和删除文件也会有安全模式不允许修改的出错提示,只需要等待一会儿即可。

现在就清楚了,那现在要解决这个问题,我想让Hadoop不处在safemode模式下,能不能不用等,直接解决呢?

答案是可以的,只要在Hadoop的目录下输入:

hadoop@Master:/opt/hadoop/bin$./hadoopdfsadmin-safemodeleave

hadoop@Master:/opt/hadoop$bin/hadoopfs-mkdirinput

hadoop@Master:/opt/hadoop/bin$cd..

hadoop@Master:/opt/hadoop$bin/hadoopfs-mkdirinput

hadoop@Master:/opt/hadoop$bin/hadoopfs-putconf/core-site.xmlinput

hadoop@Master:/opt/hadoop$bin/hadoopjarhadoop-0.20.2-examples.jargrepinputoutput'dfs[a-z.]+'

6.补充

Q:bin/hadoopjarhadoop-0.20.2-examples.jargrepinputoutput'dfs[a-z.]+'什么意思啊?

A:bin/hadoopjar(使用hadoop运行jar包)hadoop-0.20.2_examples.jar(jar包的名字)grep(要使用的类,后边的是参数)inputoutput'dfs[a-z.]+'

整个就是运行hadoop示例程序中的grep,对应的hdfs上的输入目录为input、输出目录为output。

Q:什么是grep?

A:Amap/reduceprogramthatcountsthematchesofaregexintheinput.

查看结果:

hadoop@Master:/opt/hadoop$bin/hadoopfs-ls/user/hadoop/output

Found2items

drwxr-xr-x-hadoopsupergroup02012-07-2411:29/user/hadoop/output/_logs

-rw-r--r--3hadoopsupergroup02012-07-2411:30/user/hadoop/output/part-00000

hadoop@Master:/opt/hadoop$bin/hadoopfs-rmr/user/hadoop/outputtest

Deletedhdfs://Master.Hadoop:54310/user/hadoop/outputtest

hadoop@Master:/opt/hadoop$bin/hadoopfs-rmr/user/hadoop/output

Deletedhdfs://Master.Hadoop:54310/user/hadoop/output

改用其他例子

hadoop@Master:/opt/hadoop$bin/hadoopjar/opt/hadoop/hadoop-0.20.2-examples.jarwordcountinputoutput

hadoop@Master:/opt/hadoop$bin/hadoopfs-ls/user/hadoop/output

Found2items

drwxr-xr-x-hadoopsupergroup02012-07-2411:43/user/hadoop/output/_logs

-rw-r--r--3hadoopsupergroup7722012-07-2411:43/user/hadoop/output/part-r-00000

hadoop@Master:/opt/hadoop$bin/hadoopfs-cat/user/hadoop/output/part-r-00000

(fs.SCHEME.impl)1

-->1

<!--1

</configuration>1

</property>2

<?xml1

<?xml-stylesheet1

测试成功!

重启遇到的错误

INFOipc.Client:Retryingconnecttoserver:master/192.168.0.45:54310.Alreadytried0time

./hadoopdfsadmin-report

cd/opt/hadoop-datastore/

/opt/hadoop/bin/stop-all.sh

rm-rf*

/opt/hadoop/bin/hadoopnamenode-format

如有debug设置请删除debug设置

/opt/hadoop/bin/start-all.sh

./hadoopdfsadmin-report

-------------------------------------------------------------------

HadoopmapreducejavaDemo

<dependency>

<groupId>org.apache.hadoop</groupId>

<artifactId>hadoop-core</artifactId>

<version>1.1.2</version>

</dependency>

package cn.focus.dc.hadoop;

import java.io.IOException;
import java.util.*;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;

/**
 * @author qiaowang
 */
public class WordCount {

    public static class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable> {
        private final static IntWritable one = new IntWritable(1);

        private Text word = new Text();

        public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter)
                throws IOException {
            String line = value.toString();
            StringTokenizer tokenizer = new StringTokenizer(line);
            while (tokenizer.hasMoreTokens()) {
                word.set(tokenizer.nextToken());
                output.collect(word, one);
            }
        }
    }

    public static class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable> {
        public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output,
                Reporter reporter) throws IOException {
            int sum = 0;
            while (values.hasNext()) {
                sum += values.next().get();
            }
            output.collect(key, new IntWritable(sum));
        }
    }

    public static void main(String[] args) throws Exception {
        JobConf conf = new JobConf(WordCount.class);
        conf.setJobName("wordcount");

        conf.setOutputKeyClass(Text.class);
        conf.setOutputValueClass(IntWritable.class);

        conf.setMapperClass(Map.class);
        conf.setCombinerClass(Reduce.class);
        conf.setReducerClass(Reduce.class);

        conf.setInputFormat(TextInputFormat.class);
        conf.setOutputFormat(TextOutputFormat.class);

        FileInputFormat.setInputPaths(conf, new Path(args[0]));
        FileOutputFormat.setOutputPath(conf, new Path(args[1]));

        JobClient.runJob(conf);
    }

}

方式一

在linux下

创建wordcount_classes文件夹

hadoop@Master:~/wordcount_classes$ls

cnWordCount.java

hadoop@Master:~/wordcount_classes$pwd

/home/hadoop/wordcount_classes

/usr/lib/jdk1.6.0_33/bin/javac-classpath

/opt/hadoop/hadoop-core-1.1.2.jar-d

/home/hadoop/wordcount_classes/WordCount.java

编译后

hadoop@Master:~/wordcount_classes/cn/focus/dc/hadoop$pwd

/home/hadoop/wordcount_classes/cn/focus/dc/hadoop

hadoop@Master:~/wordcount_classes/cn/focus/dc/hadoop$ls

WordCount.classWordCount$Map.classWordCount$Reduce.class

打jar包

hadoop@Master:~$/usr/lib/jdk1.6.0_33/bin/jar-cvf/home/hadoop/wordcount.jar-Cwordcount_classes/.

addedmanifest

adding:cn/(in=0)(out=0)(stored0%)

adding:cn/focus/(in=0)(out=0)(stored0%)

adding:cn/focus/dc/(in=0)(out=0)(stored0%)

adding:cn/focus/dc/hadoop/(in=0)(out=0)(stored0%)

adding:cn/focus/dc/hadoop/WordCount.class(in=1573)(out=756)(deflated51%)

adding:cn/focus/dc/hadoop/WordCount$Map.class(in=1956)(out=804)(deflated58%)

adding:cn/focus/dc/hadoop/WordCount$Reduce.class(in=1629)(out=652)(deflated59%)

adding:WordCount.java(in=2080)(out=688)(deflated66%)

hadoop@Master:~$ls

file01file02hadoop-1.1.2.tar.gzwordcount_classeswordcount.jar

运行:

/opt/hadoop/bin/hadoopjar/home/hadoop/wordcount.jarcn.focus.dc.hadoop.WordCount/user/hadoop/input/user/hadoop/output

查看结果

hadoop@Master:~$/opt/hadoop/bin/hadoopfs-cat/user/hadoop/output/part-00000

Bye1

Goodbye1

Hadoop2

Hello2

World2

方式二:

在window的工程目录下直接用maven命令打包(包括依赖包)

mvn-Ucleandependency:copy-denpendenciescompilepackage

在target下获得jar包和dependency下的jar包

copy到linux下

目录结构如下:

hadoop@Master:~/hadoop_stat/dependency$ls

hadoop-core-1.1.2.jar

hadoop@Master:~/hadoop_stat$ls

dependencyhadoop-stat-1.0.0-SNAPSHOT.jar

运行:

/opt/hadoop/bin/hadoopjar/home/hadoop/hadoop_stat/hadoop-stat-1.0.0-SNAPSHOT.jarcn.focus.dc.hadoop.WordCount/user/hadoop/input/user/hadoop/output

hadoop@Master:~/hadoop_stat$/opt/hadoop/bin/hadoopfs-cat/user/hadoop/output/part-00000

Bye1

Goodbye1

Hadoop2

Hello2

World2

相关推荐