本章我是僅做了解,所以很多地方并沒有深入去探究,用處估計不大,可酌情參考。
(132)YARN常用命令
查看任務
列出所有Application:yarn application -list
根據(jù)Application狀態(tài)過濾出指定Application,如過濾出已完成的Application:yarn application -list -appStates FINISHED
Application的狀態(tài)有:ALL、NEW、NEW_SAVING、SUBMITTED、ACCEPTED、RUNNING、FINISHED、FAILED、KILLED。
殺掉某個Application:yarn application -kill application-id
其中,application_id是一串形如application_1612577921195_0001
的字符串。
列出所有Application嘗試的列表:yarn applicationattempt -list <ApplicationId>
打印ApplicationAttempt的狀態(tài):yarn applicationattempt -status <applicationAttemptId>
查看日志
非常重要。
查詢某個Application的日志:yarn logs -applicationId <application-id>
查詢container日志:yarn logs -applicationId <ApplicationId> -containerId <ContainerId>
查看容器
列出所有容器:yarn container -list <ApplicationAttemptId>
打印容器狀態(tài):yarn container -status <ContainerId>
只有在任務運行的時候,才能看到container的狀態(tài)
查看節(jié)點狀態(tài)
列出所有節(jié)點:yarn node -list -all
就是打印出集群下所有服務器節(jié)點的運行狀態(tài)和地址信息啥的。
rmadmin更新配置
加載隊列配置:yarn rmadmin -refreshQueues
可以實現(xiàn)對隊列配置信息的動態(tài)的修改,無需停機。
查看隊列
打印隊列信息:yarn queue -status <QueueName>
比如說yarn queue -status default
,就是打印默認的隊列
會打印出隊列的狀態(tài)、當前容量等等。
(133)生產(chǎn)環(huán)境核心配置參數(shù)
同樣僅做了解,所以直接截教程的圖了:
RM默認并發(fā)是50線程
這里有個"虛擬核數(shù)"的概念,需要簡單介紹一下。
首先需要知道,集群里每個NM都有自己的一套配置參數(shù),并不嚴格要求每個NodeManager的配置參數(shù)都必須是一樣的。
這樣做主要是考慮到節(jié)點間性能差異較大的情況。比如說節(jié)點1的單核CPU性能是節(jié)點2單核CPU性能的兩倍,那么將二者一視同仁來分配任務的話就有問題了。這時候就可以開啟節(jié)點1的虛擬核功能,把一個物理核視為兩個虛擬核,這時候,節(jié)點1和節(jié)點2的單核(虛擬核)CPU性能就接近了,也方便RM來分配任務。
即不同NM的話,一個物理核數(shù)作為幾個虛擬核數(shù)來使用,是不一樣的。這樣做是為了防止因節(jié)點CPU性能不同,不好統(tǒng)一管理各個CPU。
所以,如果有CPU混搭的情況,如有節(jié)點是i5,有節(jié)點是i7這種,是有需要開啟虛擬核的。
“物理內(nèi)存檢查機制”,是為了防止節(jié)點內(nèi)存超出導致崩潰,默認打開;
(135)生產(chǎn)環(huán)境核心參數(shù)配置案例
需求:從1G數(shù)據(jù)中,統(tǒng)計每個單詞出現(xiàn)次數(shù)。服務器3臺,每臺配置4G內(nèi)存,4核CPU,4線程。
塊大小使用默認的128M,1G/128M=8,所以整個任務需要啟用8個MapTask,1個ReduceTask,以及1個MrAppMaster。
平均每個節(jié)點運行(8+1+1)/3臺 約等于 3個任務,假設采用4+3+3分布。
基于以上需求和硬件條件,可以做出如下思考:
1G數(shù)據(jù)量不大,可以使用容量調(diào)度器;
RM處理調(diào)度器的線程數(shù)量默認50,太大了,沒必要,可以削成8;
不同節(jié)點CPU性能一致,不需要開啟虛擬核;
其他配置暫且不表。
直接把教程里的yarn-site.xml配置參數(shù)貼出來吧,方便之后查看。
<!-- 選擇調(diào)度器,默認容量 -->
<property>
<description>The class to use as the resource scheduler.</description>
<name>yarn.resourcemanager.scheduler.class</name>
<value>org.apache.hadoop.yarn.server.resourcemanager.scheduler.capacity.CapacityScheduler</value>
</property>
<!-- ResourceManager處理調(diào)度器請求的線程數(shù)量,默認50;如果提交的任務數(shù)大于50,可以增加該值,但是不能超過3臺 * 4線程 = 12線程(去除其他應用程序?qū)嶋H不能超過8) -->
<property>
<description>Number of threads to handle scheduler interface.</description>
<name>yarn.resourcemanager.scheduler.client.thread-count</name>
<value>8</value>
</property>
<!-- 是否讓yarn自動檢測硬件進行配置,默認是false,如果該節(jié)點有很多其他應用程序,建議手動配置。如果該節(jié)點沒有其他應用程序,可以采用自動 -->
<property>
<description>Enable auto-detection of node capabilities such as
memory and CPU.
</description>
<name>yarn.nodemanager.resource.detect-hardware-capabilities</name>
<value>false</value>
</property>
<!-- 是否將虛擬核數(shù)當作CPU核數(shù),默認是false,采用物理CPU核數(shù) -->
<property>
<description>Flag to determine if logical processors(such as
hyperthreads) should be counted as cores. Only applicable on Linux
when yarn.nodemanager.resource.cpu-vcores is set to -1 and
yarn.nodemanager.resource.detect-hardware-capabilities is true.
</description>
<name>yarn.nodemanager.resource.count-logical-processors-as-cores</name>
<value>false</value>
</property>
<!-- 虛擬核數(shù)和物理核數(shù)乘數(shù),默認是1.0 -->
<property>
<description>Multiplier to determine how to convert phyiscal cores to
vcores. This value is used if yarn.nodemanager.resource.cpu-vcores
is set to -1(which implies auto-calculate vcores) and
yarn.nodemanager.resource.detect-hardware-capabilities is set to true. The number of vcores will be calculated as number of CPUs * multiplier.
</description>
<name>yarn.nodemanager.resource.pcores-vcores-multiplier</name>
<value>1.0</value>
</property>
<!-- NodeManager使用內(nèi)存數(shù),默認8G,修改為4G內(nèi)存 -->
<property>
<description>Amount of physical memory, in MB, that can be allocated
for containers. If set to -1 and
yarn.nodemanager.resource.detect-hardware-capabilities is true, it is
automatically calculated(in case of Windows and Linux).
In other cases, the default is 8192MB.
</description>
<name>yarn.nodemanager.resource.memory-mb</name>
<value>4096</value>
</property>
<!-- nodemanager的CPU核數(shù),不按照硬件環(huán)境自動設定時默認是8個,修改為4個 -->
<property>
<description>Number of vcores that can be allocated
for containers. This is used by the RM scheduler when allocating
resources for containers. This is not used to limit the number of
CPUs used by YARN containers. If it is set to -1 and
yarn.nodemanager.resource.detect-hardware-capabilities is true, it is
automatically determined from the hardware in case of Windows and Linux.
In other cases, number of vcores is 8 by default.</description>
<name>yarn.nodemanager.resource.cpu-vcores</name>
<value>4</value>
</property>
<!-- 容器最小內(nèi)存,默認1G -->
<property>
<description>The minimum allocation for every container request at the RM in MBs. Memory requests lower than this will be set to the value of this property. Additionally, a node manager that is configured to have less memory than this value will be shut down by the resource manager.
</description>
<name>yarn.scheduler.minimum-allocation-mb</name>
<value>1024</value>
</property>
<!-- 容器最大內(nèi)存,默認8G,修改為2G -->
<property>
<description>The maximum allocation for every container request at the RM in MBs. Memory requests higher than this will throw an InvalidResourceRequestException.
</description>
<name>yarn.scheduler.maximum-allocation-mb</name>
<value>2048</value>
</property>
<!-- 容器最小CPU核數(shù),默認1個 -->
<property>
<description>The minimum allocation for every container request at the RM in terms of virtual CPU cores. Requests lower than this will be set to the value of this property. Additionally, a node manager that is configured to have fewer virtual cores than this value will be shut down by the resource manager.
</description>
<name>yarn.scheduler.minimum-allocation-vcores</name>
<value>1</value>
</property>
<!-- 容器最大CPU核數(shù),默認4個,修改為2個 -->
<property>
<description>The maximum allocation for every container request at the RM in terms of virtual CPU cores. Requests higher than this will throw an
InvalidResourceRequestException.</description>
<name>yarn.scheduler.maximum-allocation-vcores</name>
<value>2</value>
</property>
<!-- 虛擬內(nèi)存檢查,默認打開,修改為關(guān)閉 -->
<property>
<description>Whether virtual memory limits will be enforced for
containers.</description>
<name>yarn.nodemanager.vmem-check-enabled</name>
<value>false</value>
</property>
<!-- 虛擬內(nèi)存和物理內(nèi)存設置比例,默認2.1 -->
<property>
<description>Ratio between virtual memory to physical memory when setting memory limits for containers. Container allocations are expressed in terms of physical memory, and virtual memory usage is allowed to exceed this allocation by this ratio.
</description>
<name>yarn.nodemanager.vmem-pmem-ratio</name>
<value>2.1</value>
</property>
(140/141)Tool接口案例
生產(chǎn)環(huán)境下比較有用的一個功能。僅做了解吧,本節(jié)我其實并沒有深入,只做了簡單的復制。
通過tools接口,可以實現(xiàn)我們自己程序的參數(shù)的動態(tài)修改
接下來以自定義實現(xiàn)WordCount為例。
在編寫代碼的時候,pom.xml里要引入:
<dependencies>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-client</artifactId>
<version>3.1.3</version>
</dependency>
</dependencies>
創(chuàng)建類WordCount,并實現(xiàn)Tool接口:
package com.atguigu.yarn;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.Tool;
import java.io.IOException;
public class WordCount implements Tool {
private Configuration conf;
//核心驅(qū)動
@Override
public int run(String[] args) throws Exception {
Job job = Job.getInstance(conf);
job.setJarByClass(WordCountDriver.class);
job.setMapperClass(WordCountMapper.class);
job.setReducerClass(WordCountReducer.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.setInputPaths(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
return job.waitForCompletion(true) ? 0 : 1;
}
@Override
public void setConf(Configuration conf) {
this.conf = conf;
}
@Override
public Configuration getConf() {
return conf;
}
public static class WordCountMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private Text outK = new Text();
private IntWritable outV = new IntWritable(1);
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String line = value.toString();
String[] words = line.split(" ");
for (String word : words) {
outK.set(word);
context.write(outK, outV);
}
}
}
public static class WordCountReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable outV = new IntWritable();
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int sum = 0;
for (IntWritable value : values) {
sum += value.get();
}
outV.set(sum);
context.write(key, outV);
}
}
}
新建WordCountDriver:
package com.atguigu.yarn;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import java.util.Arrays;
public class WordCountDriver {
private static Tool tool;
public static void main(String[] args) throws Exception {
// 1. 創(chuàng)建配置文件
Configuration conf = new Configuration();
// 2. 判斷是否有tool接口
switch (args[0]){
case "wordcount":
tool = new WordCount();
break;
default:
throw new RuntimeException(" No such tool: "+ args[0] );
}
// 3. 用Tool執(zhí)行程序
// Arrays.copyOfRange 將老數(shù)組的元素放到新數(shù)組里面
// 相當于是拷貝從索引為1的參數(shù)到最后的參數(shù)
int run = ToolRunner.run(conf, tool, Arrays.copyOfRange(args, 1, args.length));
System.exit(run);
}
}
然后執(zhí)行:文章來源:http://www.zghlxwxcb.cn/news/detail-721605.html
[atguigu@hadoop102 hadoop-3.1.3]$ yarn jar YarnDemo.jar com.atguigu.yarn.WordCountDriver wordcount /input /output
注意此時提交的3個參數(shù),第一個用于生成特定的Tool,第二個和第三個為輸入輸出目錄。此時如果我們希望加入設置參數(shù),可以在wordcount后面添加參數(shù),例如:文章來源地址http://www.zghlxwxcb.cn/news/detail-721605.html
[atguigu@hadoop102 hadoop-3.1.3]$ yarn jar YarnDemo.jar com.atguigu.yarn.WordCountDriver wordcount -Dmapreduce.job.queuename=root.test /input /output1
參考文獻
- 【尚硅谷大數(shù)據(jù)Hadoop教程,hadoop3.x搭建到集群調(diào)優(yōu),百萬播放】
到了這里,關(guān)于Hadoop3教程(二十四):Yarn的常用命令與參數(shù)配置實例的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!