新人学步:hbase与hadoop搭建过程

    因为工作需要,我们使用hbase + hadoop存储基于用户内容的数据(UGC),本文将描述如何逐步搭建此平台,仅作参考。

1. 环境

    操作系统:Red hat 6.3,300G硬盘,双核CPU

    JAVA:JDK1.6                    

    HBASE:hbase-0.98.1

    Hadoop:hadoop-2.2.0

    现在我们使用3台机器(虚拟机),来搭建hadoop环境,如下为机器列表,首先我们需要在三台机器上hosts文件中都增加如下信息,同时需要注意,局域网内的机器都需要设定网卡为“静态IP”,以防止机器重启后IP不断变化。本例中所有机器使用root用户操作。

127.0.0.1    localhost
::1    localhost
192.168.0.110    node1.hadoop.virtual
192.168.0.111    node2.hadoop.virtual
192.168.0.112    node2.hadoop.virtual
##需要注意当前的机器名
## 192.168.0.110(本机IP)    机器名

    当然使用“hostname”转义成IP并非必须的,我们也可以直接在hadoop、hbase的配置文件使用IP,不过本人为了便于定位node所在的机器位置,就“额外”的在hosts中增加了上述配置。

2. SSH

    hadoop、hbase中使用ssh实现无密码登录,在hadoop集群启动时将会依次跳转到集群中所有的机器上启动相应的进程,所以我们首先配置ssh授权信息(请首先安装ssh-keygen客户端,关于ssh客户端配置请参见其他文档):

ssh-keygen -t rsa

    一路回车,均采用默认配置,无password设置。最终会在:/root/.ssh目录下生成id_rsa.pub(如果使用其他用户登录,则保存在~/.ssh目录下),这个文件中所包含的的就是登陆本机所需要的rsa-pub值。我们依次在上述三台机器上执行ssh-keygen命令。

     我们把三台机器生成的三个rsa-pub值都依次写入每台机器的/root/.ssh/authorized_keys文件中,很多人忘记把本机的rsa-pub值写入自己的authorized_keys,这会导致本机的hadoop实例无法启动。

3. JDK安装

    现在JDK1.6+版本,然后直接安装,并在配置JAVA_HOME,本实例中JDK安装在/opt/app目录下。在/etc/profile文件的尾部增加如下配置:

## set java
JAVA_HOME=/opt/app/jdk1.6.0_45
PATH=$PATH:/$JAVA_HOME/bin
CLASSPATH=.:$JAVA_HOME/lib/dt.jar:$JAVA_HOME/lib/tools.jar
JRE_HOME=$JAVA_HOME/jre
export JAVA_HOME PATH CLASSPATH JRE_HOME
export LD_LIBRARY_PATH=/usr/local/lib

    配置完毕之后,执行“source /etc/profile”使环境变量生效;如果你的机器上已经安装了其他的JDK(比如openJDK),你应该首先卸载或者禁用其启动文件,然后再执行上述“source”命令。(比如,使用"whereis java"指令查找出java启动文件所在位置,然后删除它,它可能位于“/usr/bin/java”中)

    建议三台机器中,JDK、hadoop、hbase的安装目录都相同,这样排查问题和文件配置都会非常的方便。

4. ulimit设置

    因为hadoop和hbase在运行时需要打开很多的文件,同时在面向众多Client连接时也会消耗文件描述符,所以我们需要简单的将ulimit设置的稍微大一些(默认为1024,显然太小了),在/etc/security/limits.conf文件的尾部增加:

*       soft    nofile          20480
*       hard    nofile          20480

    此外还需要检测/etc/profile文件中是否也指定了“ulimit”选项,如果有,可以考虑暂时先移除它,ulimit参数将会在root用户重登陆后生效。关于linux下如何调整机器支撑的最大文件打开个数以及线程的并发数,请参见其他文档。

5. hadoop搭建与配置

    hadoop-2.2是目前相对稳定和“低调”的版本,不过hadoop的版本简直是“不尽其数”,为了配合hbase-0.98,我们需要使用hadoop-2.2版本作为支撑;不过如果条件允许,建议使用hadoop-2.4+,因为hbase-1.0.x里程碑版本将最低支持到hadoop-2.4。【hbase与hadoop版本对应表

    通过apache网站下载hadoop-2.2.tar.gz文件包,解压到指定目录,本实例中将hadoop安装在“/opt/app”目录下;我们首先在一台机器上安装hadoop并配置,稍后再把依次同步到另外两台机器即可。

    我们约定,node1.hadoop.virtual作为namenode,其他两台作为datanode,其中node2.hadoop.virtual作为辅助master(即secondarynamenode),本文中将不引入yarn(mapreduce),所以我们暂时不关心resourceManager等进程。同时约定hadoop与hbase中关于file blocksize的设定保持一致,为128M。我们依次配置如下文件(${hadoop}/etc/hadoop):

    1) masters文件:此文件只需要放置在namenode和secondarynamenode上,datanode上不需要此文件;masters文件用来指定“secondarynamenode”的地址;集群中,通常需要一个secondarynamenode。

node2.hadoop.virtual

    2) slaves文件:用来指定datanode的地址,只有datanode上需要此文件;

node1.hadoop.virtual
node2.hadoop.virtual
node3.hadoop.virtual

     如果你不希望node1同时作为datanode和namenode两种角色,你可以在slaves文件中移除它。

    3) hdfs-site.xml文件:hadoop主要有hdfs和mapreduce两种核心特性,hdfs-site.xml用来配置hdfs特性。如下配置参数,仅供参考。

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!-- 默认配置以及全量参数请参考hdfs-default.xml -->
<!-- http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml -->
<configuration>
	<!-- important -->
	<property>
		<name>dfs.replication</name>
		<value>3</value>
		<!-- default 3 -->
		<!-- but will be changed by "file.replication" of core-site.xml -->
	</property>
	<property>
		<name>dfs.blocksize</name>
		<value>128m</value>
		<!-- default 64m -->
	</property>
	<!-- optional -->
	<property>
		<name>dfs.client.block.write.retries</name>
		<value>3</value>
		<!-- default 3 -->
		<!-- if one of replication is failure when writting,How many times shoud be retry -->
	</property>
	<property>
		<name>dfs.heartbeat.interval</name>
		<value>3</value>
		<!-- heartbeat between namenode and datanode,seconds -->
	</property>

	<property>
		<name>dfs.namenode.replication.interval</name>
		<value>3</value>
		<!-- namenode check all for replication,seconds -->
	</property>
	<property>
		<name>dfs.namenode.logging.level</name>
		<value>info</value>
		<!-- default info -->
	</property>
	<property>
		<name>dfs.datanode.handler.count</name>
		<value>32</value>
		<!-- default 10 -->
		<!-- production,can be more higher than it,such as 128 -->
	</property>
	<property>
		<name>dfs.namenode.handler.count</name>
		<value>32</value>
		<!-- default 10 -->
		<!-- production,can be more higher than it,but reference the number of datanodes -->
	</property>
	<property>
		<name>dfs.namenode.name.dir</name>
		<value>/data/hadoop/dfs/name</value>
		<!-- namenode,name table -->
		<!-- local filesystem -->
	</property>
		<property>
		<name>dfs.namenode.name.dir.restore</name>
		<value>false</value>
		<!-- default false. -->
	</property>
	<property>
		<name>dfs.datanode.data.dir</name>
		<value>/data/hadoop/dfs/data</value>
		<!-- datanode,data-->
		<!-- should be local filesystem,if as SAN system,you should test it enough -->
	</property>
	<property>
		<name>dfs.namenode.checkpoint.dir</name>
		<value>/data/hadoop/dfs/namesecondary</value>
		<!-- datanode,data-->
	</property>
	<!--
	<property>
		<name>dfs.datanode.address</name>
		<value>0.0.0.0:50010</value>
	</property>
	-->
</configuration>

    4) core-site.xml:hadoop全局配置

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
http://hadoop.apache.org/docs/r2.2.0/hadoop-project-dist/hadoop-common/core-default.xml
-->
<configuration>
	<!-- very important -->
	<property>
		<name>fs.default.name</name>
		<value>hdfs://node1.hadoop.virtual:8020</value>
		<!-- you should know the ip of domain,default port is 8020 -->
	</property>
	<property>
		<name>file.blocksize</name>
		<value>134217728</value>
		<!-- 128M,default 64M,local file -->
	</property>
	<property>
		<name>hadoop.tmp.dir</name>
		<value>/var/hadoop/tmp</value>
	</property>
	<property>
		<name>file.replication</name>
		<value>2</value>
		<!-- replication factor,default 1 -->
	</property>

	<!-- optional -->
	<property>
		<name>hadoop.security.authorization</name>
		<value>false</value>
		<!-- default false -->
	</property>
	<property>
		<name>io.file.buffer.size</name>
		<value>131072</value>
		<!-- default 4096,4KB -->
	</property>
	<property>
		<name>io.bytes.per.checksum</name>
		<value>512</value>
		<!-- default 512 -->
	</property>
	<property>
		<name>io.skip.checksum.errors</name>
		<value>false</value>
		<!-- default false -->
	</property>
	<property>
		<name>io.compression.codecs</name>
		<value></value>
		<!-- default empty,a list of codecs classes,separated by ","-->
		<!-- but HDFS Client can specify it,use native lib -->
	</property>
	<property>
		<name>io.seqfile.local.dir</name>
		<value>/data/hadoop/io/local</value>
		<!-- default ${hadoop.tmp.dir}/io/local -->
		<!-- io.seqfile.compress.blocksize -->
		<!-- io.seqfile.lazydecompress -->
		<!-- io.seqfile.sorter.recordlimit -->
	</property>
	<property>
		<name>fs.trash.interval</name>
		<value>10080</value>
		<!-- max time of trash file can be kept,(:minute),default 0,disabled -->
	</property>

	<!-- bloom filter -->
	<property>
		<name>io.map.index.skip</name>
		<value>0</value>
		<!-- Number of index entries to skip between each entry. 
			Zero by default. 
			Setting this to values larger than zero can facilitate opening large MapFiles using less memory. 
		-->
	</property>
	<property>
		<name>io.map.index.interval</name>
		<value>128</value>
		<!-- MapFile consist of two files - data file (tuples) and index file (keys). 
		For every io.map.index.interval records written in the data file, 
		an entry (record-key, data-file-position) is written in the index file. 
		This is to allow for doing binary search later within the index file to look up records 
		by their keys and get their closest positions in the data file.
		-->
	</property>
		<property>
		<name>io.mapfile.bloom.size</name>
		<value>1048576</value>
		<!-- max time of trash file can be kept,(:minute),default 0,disabled -->
	</property>
	<property>
		<name>io.mapfile.bloom.error.rate</name>
		<value>0.005</value>
		<!-- max time of trash file can be kept,(:minute),default 0,disabled -->
	</property>

	<!-- ha,default:disabled -->
	<!--
	<property>
		<name>ha.zookeeper.quorum</name>
		<value></value>
	</property>
	-->

</configuration>

    5) hadoop-env.sh: hadoop启动脚本,这个脚本中包含了JVM参数调优以及全局性参数,为了便于参考,如下为全部信息。

# Set Hadoop-specific environment variables here.

# The only required environment variable is JAVA_HOME.  All others are
# optional.  When running a distributed configuration it is best to
# set JAVA_HOME in this file, so that it is correctly defined on
# remote nodes.

# The java implementation to use.
export JAVA_HOME=/opt/app/jdk1.6.0_45

# The jsvc implementation to use. Jsvc is required to run secure datanodes.
#export JSVC_HOME=${JSVC_HOME}

export HADOOP_CONF_DIR=${HADOOP_CONF_DIR:-"/etc/hadoop"}

# Extra Java CLASSPATH elements.  Automatically insert capacity-scheduler.
for f in $HADOOP_HOME/contrib/capacity-scheduler/*.jar; do
  if [ "$HADOOP_CLASSPATH" ]; then
    export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:$f
  else
    export HADOOP_CLASSPATH=$f
  fi
done

# The maximum amount of heap to use, in MB. Default is 1000.
## 1G,every damon processor,there are about 2 ~ 4 procesors,so this value should be suitable.
export HADOOP_HEAPSIZE=1024
export HADOOP_NAMENODE_INIT_HEAPSIZE="512"

# Extra Java runtime options.  Empty by default.
## default for all ,will append to name_node/data_node OPTS
export HADOOP_OPTS="$HADOOP_OPTS -Djava.net.preferIPv4Stack=true -XX:MaxPermSize=256M -XX:SurvivorRatio=6 -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=3 -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSCompactAtFullCollection"

# Command specific options appended to HADOOP_OPTS when specified
export HADOOP_NAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_NAMENODE_OPTS -Xmx2048M"

export HADOOP_DATANODE_OPTS="-Dhadoop.security.logger=ERROR,RFAS $HADOOP_DATANODE_OPTS -Xmx2048M"

export HADOOP_SECONDARYNAMENODE_OPTS="-Dhadoop.security.logger=${HADOOP_SECURITY_LOGGER:-INFO,RFAS} -Dhdfs.audit.logger=${HDFS_AUDIT_LOGGER:-INFO,NullAppender} $HADOOP_SECONDARYNAMENODE_OPTS -Xmx2048M"

# The following applies to multiple commands (fs, dfs, fsck, distcp etc)
export HADOOP_CLIENT_OPTS="-Xmx512m $HADOOP_CLIENT_OPTS"
#HADOOP_JAVA_PLATFORM_OPTS="-XX:-UsePerfData $HADOOP_JAVA_PLATFORM_OPTS"

# On secure datanodes, user to run the datanode as after dropping privileges
export HADOOP_SECURE_DN_USER=${HADOOP_SECURE_DN_USER}

# Where log files are stored.  $HADOOP_HOME/logs by default.
export HADOOP_LOG_DIR=/data/hadoop/logs/hadoop

# Where log files are stored in the secure data environment.
export HADOOP_SECURE_DN_LOG_DIR=${HADOOP_LOG_DIR}/${HADOOP_HDFS_USER}

# The directory where pid files are stored. /tmp by default.
# NOTE: this should be set to a directory that can only be written to by 
#       the user that will run the hadoop daemons.  Otherwise there is the
#       potential for a symlink attack.
export HADOOP_PID_DIR=/var/hadoop/pids
export HADOOP_SECURE_DN_PID_DIR=${HADOOP_PID_DIR}

# A string representing this instance of hadoop. $USER by default.
export HADOOP_IDENT_STRING=$USER

    配置文件调整结束后,将此hadoop-2.2安装文件全部copy到其他两台机器上(node2,node3)上;然在node1上启动集群,需要注意,在启动集群之前需要在“namenode”和“secondarynamenode”上指定format指令:

#在bin目录下
> ./hadoop namenode -format

    然后依次运行${hadoop}/sbin目录下的“start-dfs.sh”。如果集群启动过程中,没有任何异常,可以依次在每台机器上执行“jps”指令查看基于JMX的java进程,你或许会看到如下字样:

[root@node1.hadoop.virtual ~]# jps
19922 DataNode
19757 NameNode
21569 Jps

[root@node2.hadoop.virtual hadoop]# jps
31627 Jps
31466 SecondaryNameNode
31324 DataNode

[root@node3.hadoop.virtual ~]# jps
15913 DataNode
23508 Jps
16530 NodeManager

    如果某台机器上,namenode、datanode等进程启动不符合预期,请到日志文件中查看原因(本实例中,日志位于:/data/hadoop/logs)

6. hbase搭建与配置

    如果上述hadoop环境就绪后,我们就可以搭建hbase集群了,不过此处需要提醒,并不是hbase存储必须使用hadoop,它可以基于本地文件系统或者其他任何hbase支持远端文件系统。

    我们将hbase-0.98安装文件解压到/opt/app目录下,和hadoop目录平级,便于管理。

    1) hbase-core.xml:hbase核心配置

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<configuration>
    <property>
        <name>hbase.regionserver.port</name>
        <value>60020</value>
        <!-- server instance: service port -->
    </property>
    <property>
        <name>hbase.hregion.max.filesize</name>
        <value>2147483648</value>
        <!-- 2G,default 10G,max region-file size before splited,keep default! -->
    </property>
    <property>
        <name>hbase.hregion.memstore.flush.size</name>
        <value>134217728</value>
        <!-- 128M,default,the same as HDFS BlockSize -->
    </property>
    <property>
        <name>hbase.regionserver.handler.count</name>
        <value>32</value>
        <!-- worker thread for client RPC -->
        <!-- sometimes,online instance can be more higher than it -->
        <!-- eg: 64,or 100 -->
    </property>
    <property>
        <name>hbase.regionserver.lease.period</name>
        <value>120000</value>
        <!-- 2min,mr -->
    </property>
    <property>
        <name>hbase.rootdir</name>
        <value>hdfs://node1.hadoop.virtual:8020/hbase</value>
        <!-- hdfs -->
    </property>
    <property>
        <name>hbase.cluster.distributed</name>
        <value>true</value>
    </property>
        <!--
    <property>
        <name>hbase.zookeeper.property.clientPort</name>
        <value>2181</value>
    </property>
    <property>
        <name>hbase.zookeeper.property.dataDir</name>
        <value>../zookeeper</value>
    </property>
        -->
    <property>
        <name>hbase.zookeeper.quorum</name>
        <value>node1.hadoop.virtual,node2.hadoop.virtual,node2.hadoop.virtual</value>
    </property>
	<!-- 开发者自定义的filter或者processor类,可以放置在hdfs上,这是一种推荐的方式 -->
	<!-- 不过本人不希望额外引入复杂度,暂且将自定义的类放置在每个实例的本地文件中,参见hbase-env.sh -->
	
    <property>
        <name>hbase.dynamic.jars.dir</name>
        <value>hdfs://node1.hadoop.virtual:8020/hbase-extlib</value>
    </property>

    <property>
        <name>zookeeper.session.timeout</name>
        <value>30000</value>
        <!-- tradeoff,you should find a proper value -->
    </property>
    <property>
        <name>zookeeper.znode.parent</name>
        <value>/online-hbase</value>
    </property>
</configuration>

    hadoop并不依赖于zookeeper来保存META信息,甚至hadoop本身不具有master选举的特性;不过hbase则不同,hbase将会把META信息保存在zookeeper中,以及使用zookeeper来感知reginserver的存活状态、HMaster选举等等,所以在搭建hbase之前,需要一个即备的zookeeper集群,具体zookeeper的搭建此处将不再赘言。

    2) regionservers文件:配置集群中regionserver的列表,我们将会在如下三台机器上搭建hbase集群

node1.hadoop.virtual
node2.hadoop.virtual
node2.hadoop.virtual

    3) hbase-env.sh:hbase启动脚本,其中包含了JVM参数选项,为了便于查阅,此处为全文。

# Set environment variables here.

# This script sets variables multiple times over the course of starting an hbase process,
# so try to keep things idempotent unless you want to take an even deeper look
# into the startup scripts (bin/hbase, etc.)

# The java implementation to use.  Java 1.6 required.
export JAVA_HOME=/opt/app/jdk1.6.0_45

# Extra Java CLASSPATH elements.  Optional.
export HBASE_CLASSPATH="/opt/app/hbase-0.98.1/extlib/hadoop-ext-1.0.0-SNAPSHOT.jar"

# The maximum amount of heap to use, in MB. Default is 1000.
# export HBASE_HEAPSIZE=1000

# Extra Java runtime options.
# Below are what we set by default.  May only work with SUN JVM.
# For more on why as well as other possible settings,
# see http://wiki.apache.org/hadoop/PerformanceTuning
export HBASE_OPTS="-XX:+UseConcMarkSweepGC -XX:MaxPermSize=256M -XX:SurvivorRatio=6 -XX:+UseConcMarkSweepGC -XX:MaxTenuringThreshold=5 -XX:CMSInitiatingOccupancyFraction=70 -XX:+UseCMSCompactAtFullCollection"

# Uncomment one of the below three options to enable java garbage collection logging for the server-side processes.

# This enables basic gc logging to the .out file.
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"

# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"

# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
export SERVER_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/data/hadoop/logs/hbase/server-gc.log.$(date +%Y%m%d%H%M) -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"

# Uncomment one of the below three options to enable java garbage collection logging for the client processes.

# This enables basic gc logging to the .out file.
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps"

# This enables basic gc logging to its own file.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
# export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:<FILE-PATH>"

# This enables basic GC logging to its own file with automatic log rolling. Only applies to jdk 1.6.0_34+ and 1.7.0_2+.
# If FILE-PATH is not replaced, the log file(.gc) would still be generated in the HBASE_LOG_DIR .
export CLIENT_GC_OPTS="-verbose:gc -XX:+PrintGCDetails -XX:+PrintGCDateStamps -Xloggc:/data/hadoop/logs/hbase/client-gc.log.$(date +%Y%m%d%H%M) -XX:+UseGCLogFileRotation -XX:NumberOfGCLogFiles=1 -XX:GCLogFileSize=512M"

# Uncomment below if you intend to use the EXPERIMENTAL off heap cache.
# export HBASE_OPTS="$HBASE_OPTS -XX:MaxDirectMemorySize="
# Set hbase.offheapcache.percentage in hbase-site.xml to a nonzero value.


# Uncomment and adjust to enable JMX exporting
# See jmxremote.password and jmxremote.access in $JRE_HOME/lib/management to configure remote password access.
# More details at: http://java.sun.com/javase/6/docs/technotes/guides/management/agent.html
#
export HBASE_JMX_BASE="-Dcom.sun.management.jmxremote.ssl=false -Dcom.sun.management.jmxremote.authenticate=false"
export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10101 -Xmx2048M -Xms2048M -XX:MaxNewSize=512M"
export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10102 -Xmx4096M -Xms4096M -XX:MaxNewSize=1024M"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10103"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10104"
# export HBASE_REST_OPTS="$HBASE_REST_OPTS $HBASE_JMX_BASE -Dcom.sun.management.jmxremote.port=10105"

# File naming hosts on which HRegionServers will run.  $HBASE_HOME/conf/regionservers by default.
# export HBASE_REGIONSERVERS=${HBASE_HOME}/conf/regionservers

# Uncomment and adjust to keep all the Region Server pages mapped to be memory resident
#HBASE_REGIONSERVER_MLOCK=true
#HBASE_REGIONSERVER_UID="hbase"

# File naming hosts on which backup HMaster will run.  $HBASE_HOME/conf/backup-masters by default.
# export HBASE_BACKUP_MASTERS=${HBASE_HOME}/conf/backup-masters

# Extra ssh options.  Empty by default.
# export HBASE_SSH_OPTS="-o ConnectTimeout=1 -o SendEnv=HBASE_CONF_DIR"

# Where log files are stored.  $HBASE_HOME/logs by default.
export HBASE_LOG_DIR=/data/hadoop/logs/hbase

# Enable remote JDWP debugging of major HBase processes. Meant for Core Developers 
# export HBASE_MASTER_OPTS="$HBASE_MASTER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8070"
# export HBASE_REGIONSERVER_OPTS="$HBASE_REGIONSERVER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8071"
# export HBASE_THRIFT_OPTS="$HBASE_THRIFT_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8072"
# export HBASE_ZOOKEEPER_OPTS="$HBASE_ZOOKEEPER_OPTS -Xdebug -Xrunjdwp:transport=dt_socket,server=y,suspend=n,address=8073"

# A string representing this instance of hbase. $USER by default.
# export HBASE_IDENT_STRING=$USER

# The scheduling priority for daemon processes.  See 'man nice'.
# export HBASE_NICENESS=10

# The directory where pid files are stored. /tmp by default.
export HBASE_PID_DIR=/var/hadoop/pids

# Seconds to sleep between slave commands.  Unset by default.  This
# can be useful in large clusters, where, e.g., slave rsyncs can
# otherwise arrive faster than the master can service them.
# export HBASE_SLAVE_SLEEP=0.1

# Tell HBase whether it should manage it's own instance of Zookeeper or not.
export HBASE_MANAGES_ZK=false

# The default log rolling policy is RFA, where the log file is rolled as per the size defined for the 
# RFA appender. Please refer to the log4j.properties file to see more details on this appender.
# In case one needs to do log rolling on a date change, one should set the environment property
# HBASE_ROOT_LOGGER to "<DESIRED_LOG LEVEL>,DRFA".
# For example:
# HBASE_ROOT_LOGGER=INFO,DRFA
# The reason for changing default to RFA is to avoid the boundary case of filling out disk space as 
# DRFA doesn't put any cap on the log size. Please refer to HBase-5655 for more context.

    到此为之,可以把调整配置之后的hbase安装包,在三台机器上部署,并在bin目录下,我们将node3作为HMaster,所以需要在node3上执行“start-hbase.sh”,此后hbase集群将会全部启动。

    可以使用jps指令查看节点是否启动正常,在node3上将会看到“HMaster,HRegionServer”两个进程,其他2个node上只会看到“HRegionServer”进程。

相关推荐