一、寫(xiě)在前面??
大家好!這篇文章是我在搭建Hdfs的HA(高可用)時(shí)寫(xiě)下的詳細(xì)筆記與感想,希望能幫助到大家!本篇文章收錄于 初心 的 大數(shù)據(jù) 專欄。
?? 個(gè)人主頁(yè):初心%個(gè)人主頁(yè)
?? 個(gè)人簡(jiǎn)介:大家好,我是初心,和大家共同努力
?? 座右銘:理想主義的花,終究會(huì)盛開(kāi)在浪漫主義的土壤里!??????
??歡迎大家:這里是CSDN,我記錄知識(shí)的地方,喜歡的話請(qǐng)三連,有問(wèn)題請(qǐng)私信??
二、Zookeeper安裝?
- 1.將Zookeeper壓縮包上傳到 Hadoop102的/opt/software 目錄下
這里我們還是使用Xshell+Xftp進(jìn)行文件上傳,將Zookeeper上傳。
- 2.解壓到 /opt/module/HA 目錄下
tar -xzvf /opt/software/apache-zookeeper-3.5.7-bin.tar.gz -C /opt/module/HA/
- 3.將Zookeeper重命名
mv /opt/module/HA/apache-zookeeper-3.5.7-bin/ zookeeper
- 4.重命名Zookeeper的zoo_sample.cfg為zoo.cfg文件
mv zoo_sample.cfg zoo.cfg
- 5.修改zoo.cfg文件
vim /opt/module/HA/zookeeper/conf/zoo.cfg
按下G,按下o,直接插入到最后一行,插入以下內(nèi)容,其中hadoop102,hadoop103,hadoop104分別是三個(gè)節(jié)點(diǎn)的主機(jī)名。
server.1=hadoop102:2888:3888
server.2=hadoop103:2888:3888
server.3=hadoop104:2888:3888
- 6.新建myid文件
進(jìn)入到zookeeper目錄下:
cd /opt/module/HA/zookeeper/
新建zkData目錄:
mkdir zkData
進(jìn)入到zkData目錄,并新建myid文件:
cd zkData
vim myid
只需分別在Hadoop102,Hadoop103,Hadoop104的myid文件添加數(shù)字 1,2,3 即可,保存退出:
- 7.配置環(huán)境變量
vim /etc/profile.d/my_env.sh
插入以下內(nèi)容:
# ZOOKEEPER_HOME
export ZOOKEEPER_HOME=/opt/module/HA/zookeeper
export PATH=$PATH:$ZOOKEEPER_HOME/bin
- 8.刷新環(huán)境變量
source /etc/profile.d/my_env.sh
- 9.分發(fā)
這里需要分發(fā)的有Zookeeper,my_env.sh文件,以及都要刷新環(huán)境變量。前面兩個(gè)命令在hadoop102執(zhí)行即可,第三個(gè)命令在hadoop103,hadoop104上都要執(zhí)行一次。
xsync /opt/module/HA/zookeeper/
xsync /etc/profile.d/my_env.sh
source /etc/profile.d/my_env.sh
三、Hadoop配置?
- 1.保留原來(lái)的Hadoop集群
為什么要保留原來(lái)的集群?
在搭建Hadoop高可用之前,我們的集群是有Hdfs,Yarn,JobHistory,這些我們?nèi)蘸筮€需要繼續(xù)學(xué)習(xí)使用,因此我選擇保留下來(lái),換句話說(shuō),**即使我們高可用(HA)搭建失敗了,我們還能回到開(kāi)始的狀態(tài)。**保留方式就是我們搭建的時(shí)候不要直接使用Hadoop目錄,而是復(fù)制一份。
- 2.復(fù)制Hadoop目錄
cp -r /opt/module/hadoop-3.1.3/ /opt/module/HA/
- 3.刪除data和logs目錄
cd /opt/module/HA/hadoop-3.1.3/
rm -rf data
rm -rf logs
- 4.新建Zookeeper相關(guān)目錄
cd /opt/module/HA/
mkdir logs
mkdir tmp
- 5.修改兩個(gè)配置文件
這里的兩個(gè)配置文件是 core-site.xml
文件和 hdfs-site.xml
文件,這是Hadoop目錄中僅需要修改的兩個(gè)文件.這里以注釋形式給出需要修改的地方,不用修改這兩個(gè)文件,直接覆蓋即可.
hdfs-site.xml:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- 集群名稱,此值在接下來(lái)的配置中將多次出現(xiàn)務(wù)必注意同步修改 -->
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
<!-- 所有的namenode列表,此處也只是邏輯名稱,非namenode所在的主機(jī)名稱 -->
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
<!-- namenode之間用于RPC通信的地址,value填寫(xiě)namenode所在的主機(jī)地址 -->
<!-- 默認(rèn)端口8020,注意mycluster1與nn1要和上文的配置一致 -->
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>hadoop102:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>hadoop103:8020</value>
</property>
<!-- namenode的web訪問(wèn)地址,默認(rèn)端口9870 -->
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>hadoop102:9870</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>hadoop103:9870</value>
</property>
<!-- journalnode主機(jī)地址,最少三臺(tái),默認(rèn)端口8485 -->
<!-- 格式為 qjournal://jn1:port;jn2:port;jn3:port/${nameservices} -->
<!-- a shared edits dir must not be specified if HA is not enabled -->
<!-- 偽分布式時(shí),取消該配置 -->
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://hadoop102:8485;hadoop103:8485;hadoop104:8485/mycluster</value>
</property>
<!-- 故障時(shí)自動(dòng)切換的實(shí)現(xiàn)類,照抄即可 -->
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
<!-- 故障時(shí)相互操作方式(namenode要切換active和standby),這里我們選ssh方式 -->
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<!-- 修改為自己用戶的ssh key存放地址 -->
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/home/sky/.ssh/id_rsa</value>
</property>
<!-- namenode日志文件輸出路徑,即journalnode讀取變更的位置 -->
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/opt/module/HA/logs/</value>
</property>
<!-- 啟用自動(dòng)故障轉(zhuǎn)移 -->
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
<property>
<name>dfs.webhdfs.enabled</name>
<value>true</value>
</property>
<!-- 解決 DataXceiver error processing WRITE_BLOCK operation src -->
<property>
<name>dfs.datanode.max.transfer.threads</name>
<value>8192</value>
<description>
Specifies the maximum number of threads to use for transferring data
in and out of the DN.
</description>
</property>
</configuration>
core-site.xml:
<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
<!-- Put site-specific property overrides in this file. -->
<configuration>
<!-- HDFS主入口,mycluster僅是作為集群的邏輯名稱,可隨意更改但務(wù)必與hdfs-site.xml中dfs.nameservices值保持一致 -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
<!-- 默認(rèn)的hadoop.tmp.dir指向的是/tmp目錄,將導(dǎo)致namenode與datanode數(shù)據(jù)全都保存在易失目錄中,此處進(jìn)行修改 -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/HA/tmp/</value>
<!-- <value>/opt/bigdata/hadoopha</value> -->
</property>
<!-- 用戶角色配置,不配置此項(xiàng)會(huì)導(dǎo)致web頁(yè)面報(bào)錯(cuò)(不能操作數(shù)據(jù)) -->
<property>
<name>hadoop.http.staticuser.user</name>
<value>sky</value>
</property>
<!-- zookeeper集群地址,這里只配置了單臺(tái),如是集群以逗號(hào)進(jìn)行分隔 -->
<property>
<name>ha.zookeeper.quorum</name>
<value>hadoop102:2181,hadoop103:2181,hadoop104:2181</value>
</property>
<!-- 權(quán)限配置 hadoop.proxyuser.{填寫(xiě)自己的用戶名}.hosts-->
<property>
<name>hadoop.proxyuser.sky.hosts</name>
<value>*</value>
</property>
<property>
<name>hadoop.proxyuser.sky.groups</name>
<value>*</value>
</property>
<!-- 解決journalnode連接不上,導(dǎo)致namenode啟動(dòng)問(wèn)題 -->
<!-- 也有可能是網(wǎng)絡(luò)的問(wèn)題,參考該文章:https://blog.csdn.net/tototuzuoquan/article/details/89644127 -->
<!-- 在dev環(huán)境中出現(xiàn)連不上journalnode問(wèn)題,添加該配置,以增加重試次數(shù)和間隔 -->
<property>
<name>ipc.client.connect.max.retries</name>
<value>100</value>
<description>Indicates the number of retries a client will make to establish a server connection.</description>
</property>
<property>
<name>ipc.client.connect.retry.interval</name>
<value>10000</value>
<description>Indicates the number of milliseconds a client will wait for before retrying to establish a server connection.</description>
</property>
</configuration>
- 6.修改環(huán)境變量
vim /etc/profile.d/my_env.sh
將HADOOP_HOME改為新的Hadoop目錄所在路徑:
# HADOOP_HOME
export HADOOP_HOME=/opt/module/HA/hadoop-3.1.3
- 7.分發(fā)Hadoop目錄和my_env.sh文件
xsync /opt/module/HA/hadoop-3.1.3/
xsync /etc/profile.d/my_env.sh
- 8.刷新環(huán)境變量
分別在三個(gè)節(jié)點(diǎn)上刷新環(huán)境變量:
source /etc/profile.d/my_env.sh
四、Hadoop HA自動(dòng)模式?
- 1.修改hadoop/etc/hadoop/hadoop-env.sh文件
vim /opt/module/HA/hadoop-3.1.3/etc/hadoop/hadoop-env.sh
在末尾插入以下內(nèi)容,sky是用戶名,因?yàn)槲抑潦贾两K都沒(méi)使用root賬號(hào).
export HDFS_ZKFC_USER=sky
export HDFS_JOURNALNODE_USER=sky
- 2.分發(fā)
xsync /opt/module/HA/hadoop-3.1.3/
- 2.啟動(dòng)與初始化集群
啟動(dòng)Zookeeper:
zkServer.sh start
刷新環(huán)境變量:
source /etc/profile.d/my_env.sh
啟動(dòng)Hdfs:
myhadoop start
myhadoop是我另外一個(gè)Hadoop集群?jiǎn)⑼D_本,具體的腳本內(nèi)容和使用請(qǐng)參考:Hadoop集群?jiǎn)⑼D_本
- 3.查看namenode的活躍狀態(tài)
zkServer.sh status
五、HA腳本分享?
腳本名字是myHA.sh,功能是實(shí)現(xiàn)一鍵啟動(dòng)和停止Zookeeper和Hdfs,查看zookeeper狀態(tài).
#! /bin/bash
if [ $# -lt 1 ]
then
echo "No Args Input..."
exit;
fi
case $1 in
"start"){
echo "----------啟動(dòng)zookeeper----------"
for i in hadoop102 hadoop103 hadoop104
do
echo ---------- zookeeper $i 啟動(dòng) ------------
ssh $i "/opt/module/HA/zookeeper/bin/zkServer.sh start"
done
echo "---------- 啟動(dòng)hdfs------------"
ssh hadoop102 "/opt/module/HA/hadoop-3.1.3/sbin/start-dfs.sh"
echo "---------- hadoop HA啟動(dòng)成功------------"
};;
"stop"){
echo "----------關(guān)閉hdfs----------"
ssh hadoop102 "/opt/module/HA/hadoop-3.1.3/sbin/stop-dfs.sh"
echo "----------關(guān)閉zookeeper----------"
for i in hadoop102 hadoop103 hadoop104
do
echo ---------- zookeeper $i 停止 ------------
ssh $i "/opt/module/HA/zookeeper/bin/zkServer.sh stop"
done
echo "---------- hadoop HA停止成功------------"
};;
"status"){
for i in hadoop102 hadoop103 hadoop104
do
echo ---------- zookeeper $i 狀態(tài) ------------
ssh $i "/opt/module/HA/zookeeper/bin/zkServer.sh status"
done
};;
*)
echo "Input Args Error"
;;
esac
七、結(jié)語(yǔ)??
?? 本文主要講解了如何搭建Hdfs的高可用(HA),后期還會(huì)出加上Yarn的高可用教程,大家可以期待一下哦!??
? 這就是今天要分享給大家的全部?jī)?nèi)容了,我們下期再見(jiàn)!??
?? 世間所有的相遇,都是久別重逢~
??文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-456995.html
?? 我在CSDN等你哦!??文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-456995.html
到了這里,關(guān)于【大數(shù)據(jù)】Hadoop高可用集群搭建的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!