概述
Hadoop: 分布式系統(tǒng)基礎(chǔ)架構(gòu)
?
解決問(wèn)題: 海量數(shù)據(jù)存儲(chǔ)、海量數(shù)據(jù)的分析計(jì)算
?
官網(wǎng):https://hadoop.apache.org/
?
HDFS(Hadoop Distributed File System): 分布式文件系統(tǒng),用于存儲(chǔ)數(shù)據(jù)
?
Hadoop的默認(rèn)配置【core-site.xml】: https://hadoop.apache.org/docs/r3.3.6/hadoop-project-dist/hadoop-common/core-default.xml == 配置Hadoop集群中各個(gè)組件間共享屬性和通用參數(shù)以實(shí)現(xiàn)更好的性能和可靠性 == hadoop目錄\share\hadoop\common\hadoop-common-3.3.6.jar
?
Hadoop的默認(rèn)配置【hdfs-site.xml】: https://hadoop.apache.org/docs/r3.3.6/hadoop-project-dist/hadoop-hdfs/hdfs-default.xml === 配置HDFS組件中各種參數(shù)以實(shí)現(xiàn)更好的性能和可靠性(如數(shù)據(jù)塊大小、心跳間隔等)== hadoop目錄\share\hadoop\hdfs\hadoop-hdfs-3.3.6.jar
?
Hadoop的默認(rèn)配置【mapred-site.xml】: https://hadoop.apache.org/docs/r3.3.6/hadoop-mapreduce-client/hadoop-mapreduce-client-core/mapred-default.xml === 配置MapReduce任務(wù)執(zhí)行過(guò)程進(jìn)行參數(shù)調(diào)整、優(yōu)化等操作 == hadoop目錄\share\hadoop\mapreduce\hadoop-mapreduce-client-core-3.3.6.jar
?
Hadoop的默認(rèn)配置【yarn-site.xml】: https://hadoop.apache.org/docs/r3.3.6/hadoop-yarn/hadoop-yarn-common/yarn-default.xml === 配置YARN資源管理器(ResourceManager)和節(jié)點(diǎn)管理器(NodeManager)的行為 == hadoop目錄\share\hadoop\yarn\hadoop-yarn-common-3.3.6.jar
基礎(chǔ)知識(shí)
Hadoop組件構(gòu)成
Hadoop配置文件
配置文件路徑: hadoop目錄/etc/hadoop
環(huán)境準(zhǔn)備
配置
//修改主機(jī)名
//more /etc/sysconfig/network == 內(nèi)容如下 //不同機(jī)器取不同的HOSTNAME,不要取成一樣的
NETWORKING=yes
HOSTNAME=hadoop107
//=======================
//固定IP地址 == 自行百度
ifconfig
more /etc/sysconfig/network-scripts/ifcfg-ens33
//=======================
// 查看自定義主機(jī)名、ip的映射關(guān)系 == more /etc/hosts
ping 主機(jī)名
Hadoop配置
下載
官網(wǎng): https://hadoop.apache.org/releases.html
配置環(huán)境變量
//將壓縮包解壓到指定目錄
mkdir -p /opt/module/ && tar -zxvf hadoop-3.3.6.tar.gz -C /opt/module/
//進(jìn)入解壓后的軟件目錄
cd /opt/module/hadoop-3.3.6
//設(shè)置環(huán)境變量
vim /etc/profile
//此文件末尾添加下面四行內(nèi)容
## Hadoop
export HADOOP_HOME=/opt/module/hadoop-3.3.6
export PATH=$PATH:$HADOOP_HOME/bin
export PATH=$PATH:$HADOOP_HOME/sbin
//使環(huán)境變量生效
source /etc/profile
Hadoop運(yùn)行模式
Standalone Operation(本地)
參考: https://hadoop.apache.org/docs/r3.3.6/hadoop-project-dist/hadoop-common/SingleCluster.html#Standalone_Operation
官方Demo
官方Demo,統(tǒng)計(jì)文件中某個(gè)正則規(guī)則的單詞出現(xiàn)次數(shù)
# hadoop目錄
cd /opt/module/hadoop-3.3.6
# 創(chuàng)建數(shù)據(jù)源文件 == 用于下面進(jìn)行demo統(tǒng)計(jì)單詞
mkdir input
# 復(fù)制一些普通的文件
cp etc/hadoop/*.xml input
# 統(tǒng)計(jì)input里面的源文件規(guī)則是'dfs[a-z.]+'的單詞個(gè)數(shù),并將結(jié)果輸出到當(dāng)前目錄下的output目錄下 == 輸出目錄不得提前創(chuàng)建,運(yùn)行時(shí)提示會(huì)報(bào)錯(cuò)
bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar grep input output 'dfs[a-z.]+'
# 查看統(tǒng)計(jì)結(jié)果
cat output/*
cat output/part-r-00000
# 顯示出來(lái)的結(jié)果,跟grep查出來(lái)的一樣
WordCount單詞統(tǒng)計(jì)Demo
//創(chuàng)建數(shù)據(jù)目錄
mkdir -p /opt/module/hadoop-3.3.6/input/wordCountData && cd /opt/module/hadoop-3.3.6/input/
//文件數(shù)據(jù)創(chuàng)建 = 用于demo測(cè)試
echo "cat apple banana" >> wordCountData/data1.txt
echo "dog" >> wordCountData/data1.txt
echo " elephant" >> wordCountData/data1.txt
echo "cat apple banana" >> wordCountData/data2.txt
echo "dog" >> wordCountData/data2.txt
echo " elephant queen" >> wordCountData/data2.txt
//查看數(shù)據(jù)內(nèi)容
more wordCountData/data1.txt
more wordCountData/data2.txt
//開(kāi)始統(tǒng)計(jì)wordCountData文件目錄下的單詞數(shù)
hadoop jar /opt/module/hadoop-3.3.6/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar wordcount /opt/module/hadoop-3.3.6/input/wordCountData wordCountDataoutput
//查看統(tǒng)計(jì)結(jié)果
cd /opt/module/hadoop-3.3.6/input/wordCountDataoutput
cat ./*
Pseudo-Distributed Operation(偽分布式模式)
參考: https://hadoop.apache.org/docs/r3.3.6/hadoop-project-dist/hadoop-common/SingleCluster.html#Pseudo-Distributed_Operation
?
概述: 單節(jié)點(diǎn)的分布式系統(tǒng)(用于測(cè)試使用)
配置修改
?
核心配置文件修改: vim /opt/module/hadoop-3.3.6/etc/hadoop/core-site.xml
<configuration>
<!-- 默認(rèn)是本地文件協(xié)議 file: -->
<property>
<name>fs.defaultFS</name>
<value>hdfs://192.168.19.107:9000</value>
</property>
<!-- 臨時(shí)目錄 默認(rèn)/tmp/hadoop-${user.name} -->
<property>
<name>hadoop.tmp.dir</name>
<value>/opt/module/hadoop-3.3.6/tmp</value>
</property>
</configuration>
?
核心配置文件修改: vim /opt/module/hadoop-3.3.6/etc/hadoop/hdfs-site.xml
<configuration>
<!-- 集群設(shè)置為1, 默認(rèn)3 -->
<property>
<name>dfs.replication</name>
<value>1</value>
</property>
</configuration>
啟動(dòng)DFS【9870】
Hadoop-DFS數(shù)據(jù)清空格式化
hdfs namenode -format
啟動(dòng)DFS組件
注意: 啟動(dòng)過(guò)程中可能遇到非root用戶(hù)、JAVA_HOME找不到的現(xiàn)象,導(dǎo)致啟動(dòng)失敗,自行參考下面的問(wèn)題解決
# 未啟動(dòng)hadoop時(shí)所系統(tǒng)所運(yùn)行java程序
jps
# 啟動(dòng)hadoop相關(guān)的應(yīng)用程序
sh /opt/module/hadoop-3.3.6/sbin/start-dfs.sh
# 查看啟動(dòng)hadoop的應(yīng)用變化
jps
訪(fǎng)問(wèn)DFS前端頁(yè)面(不同版本的Hadoop的NameNode端口有變)
瀏覽器NameNode前端頁(yè)面: http://192.168.19.107:9870/
dfs命令使用(主要用來(lái)操作文件)
幫助文檔: hdfs dfs --help
復(fù)制物理機(jī)文件中hadoop中
hdfs dfs -mkdir /test
hdfs dfs -put /opt/module/hadoop-3.3.6/input /test
文件展示以及讀取文件內(nèi)容
hdfs dfs -ls -R /
hdfs dfs -cat /test/input/core-site.xml
創(chuàng)建目錄、文件
hdfs dfs -mkdir -p /test/linrc
hdfs dfs -touch /test/linrc/1.txt
使用mapreduce進(jìn)行計(jì)算hadoop里面某個(gè)文件夾的內(nèi)容
hdfs dfs -ls /test/input
# 對(duì)hadoop里面某個(gè)文件夾內(nèi)容進(jìn)行單詞統(tǒng)計(jì)
hadoop jar /opt/module/hadoop-3.3.6/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar wordcount /test/input/wordCountData /test/input/wordCountDataoutput2
hdfs dfs -ls /test/input
# 查看統(tǒng)計(jì)結(jié)果
hdfs dfs -cat /test/input/wordCountDataoutput2/*
啟動(dòng)Yarn組件【8088】
配置修改
強(qiáng)制指定Yarn的環(huán)境變量: /opt/module/hadoop-3.3.6/etc/hadoop/yarn-env.sh
export JAVA_HOME=/www/server/jdk8/jdk1.8.0_202
?
?
?
yarn-site.xml添加如下兩個(gè)配置 /opt/module/hadoop-3.3.6/etc/hadoop/yarn-site.sh
<configuration>
<!-- Site specific YARN configuration properties == https://hadoop.apache.org/docs/r3.3.6/hadoop-yarn/hadoop-yarn-common/yarn-default.xml -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>192.168.19.107</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HO
ME</value>
</property>
<!-- 查看任務(wù)日志時(shí),防止其重定向localhost,導(dǎo)致頁(yè)面打開(kāi)失敗 -->
<property>
<name>yarn.timeline-service.hostname</name>
<value>192.168.19.107</value>
</property>
</configuration>
啟動(dòng)
//非常重要,必須回到hadoop的目錄里面進(jìn)行啟動(dòng),我也不知道為什么
cd /opt/module/hadoop-3.3.6
//不要使用 sh命令啟動(dòng),否則報(bào)錯(cuò),我也不知道為什么
./sbin/start-yarn.sh
訪(fǎng)問(wèn)yarn前端頁(yè)面
瀏覽器: http://ip:8088
?
yarn頁(yè)面端口配置: https://hadoop.apache.org/docs/r3.3.6/hadoop-yarn/hadoop-yarn-common/yarn-default.xml的【yarn.resourcemanager.webapp.address】
運(yùn)行計(jì)算dfs某個(gè)目錄所有文件的單詞數(shù),yarn頁(yè)面有運(yùn)行記錄
//單詞計(jì)算開(kāi)始
hadoop jar /opt/module/hadoop-3.3.6/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar wordcount /test/input/wordCountData /test/input/wordCountDataoutput3
啟動(dòng)MapReduce組件
配置修改
強(qiáng)制指定mapred的環(huán)境變量: /opt/module/hadoop-3.3.6/etc/hadoop/mapred-env.sh
export JAVA_HOME=/www/server/jdk8/jdk1.8.0_202
?
mapred-site.xml添加如下配置: /opt/module/hadoop-3.3.6/etc/hadoop/mapred-site.xml
<configuration>
<!-- The runtime framework for executing MapReduce jobs. Can be one of local, classic or yarn -->
<property>
<name>mapreduce.framework.name</name>
<value>yarn</value>
</property>
<property>
<name>mapreduce.application.classpath</name>
<value>$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/*:$HADOOP_MAPRED_HOME/share/hadoop/mapreduce/lib/*</value>
</property>
<!-- mr運(yùn)行日志采集系統(tǒng)配置 -->
<property>
<name>mapreduce.jobhistory.address</name>
<value>192.168.19.107:10020</value>
</property>
<property>
<name>mapreduce.jobhistory.webapp.address</name>
<value>192.168.19.107:19888</value>
</property>
</configuration>
啟動(dòng)日志采集系統(tǒng)
mapred --daemon start historyserver
查看任務(wù)日志
啟動(dòng)日志聚集(任務(wù)執(zhí)行的具體詳情上傳到HDFS組件中)
未啟動(dòng)前
啟動(dòng)
注意: 如果yarn組件已經(jīng)啟動(dòng),修改yarn的配置需要重新啟動(dòng),使得配置生效
#停止日志系統(tǒng)
mapred --daemon stop historyserver
#停止yarn組件
cd /opt/module/hadoop-3.3.6
./sbin/stop-yarn.sh
yarn-site.xml添加如下配置 /opt/module/hadoop-3.3.6/etc/hadoop/yarn-site.sh
<configuration>
<!-- Site specific YARN configuration properties == https://hadoop.apache.org/docs/r3.3.6/hadoop-yarn/hadoop-yarn-common/yarn-default.xml -->
<property>
<name>yarn.nodemanager.aux-services</name>
<value>mapreduce_shuffle</value>
</property>
<property>
<name>yarn.resourcemanager.hostname</name>
<value>192.168.19.107</value>
</property>
<property>
<name>yarn.nodemanager.env-whitelist</name>
<value>JAVA_HOME,HADOOP_COMMON_HOME,HADOOP_HDFS_HOME,HADOOP_CONF_DIR,CLASSPATH_PREPEND_DISTCACHE,HADOOP_YARN_HOME,HADOOP_HOME,PATH,LANG,TZ,HADOOP_MAPRED_HO
ME</value>
</property>
<property>
<name>yarn.timeline-service.hostname</name>
<value>192.168.19.107</value>
</property>
<!-- 日志聚集啟動(dòng) -->
<property>
<name>yarn.log-aggregation-enable</name>
<value>true</value>
</property>
<!-- 日志聚集的日志保留的時(shí)間,單位秒 -->
<property>
<name>yarn.log-aggregation.retain-seconds</name>
<value>2592000</value>
</property>
</configuration>
?
?
#啟動(dòng)yarn組件
cd /opt/module/hadoop-3.3.6
./sbin/start-yarn.sh
#啟動(dòng)日志系統(tǒng)
mapred --daemon start historyserver
?
?
# 重新運(yùn)行一個(gè)任務(wù)
hadoop jar /opt/module/hadoop-3.3.6/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar wordcount /test/input/wordCountData /test/input/wordCountDataoutput5
文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-840262.html
文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-840262.html
到了這里,關(guān)于Hadoop學(xué)習(xí)1:概述、單體搭建、偽分布式搭建的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!