先簡單介紹一下partitioner 和 combiner?
Partitioner類
- 用于在Map端對key進行分區(qū)
- 默認使用的是HashPartitioner
- 獲取key的哈希值
- 使用key的哈希值對Reduce任務數(shù)求模
- 決定每條記錄應該送到哪個Reducer處理
- 默認使用的是HashPartitioner
- 自定義Partitioner
- 繼承抽象類Partitioner,重寫getPartition方法
- job.setPartitionerClass(MyPartitioner.class)
Combiner類
- Combiner相當于本地化的Reduce操作
- 在shuffle之前進行本地聚合
- 用于性能優(yōu)化,可選項
- 輸入和輸出類型一致
- Reducer可以被用作Combiner的條件
- 符合交換律和結合律
- 實現(xiàn)Combiner
- job.setCombinerClass(WCReducer.class)
我們進入案例來看這兩個知識點
一 案例需求
一個存放電話號碼的文本,我們需要136 137,138 139和其它開頭的號碼分開存放統(tǒng)計其每個數(shù)字開頭的號碼個數(shù)

?二 PhoneMapper 類
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
public class PhoneMapper extends Mapper<LongWritable, Text,Text, IntWritable> {
@Override
protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String phone = value.toString();
Text text = new Text(phone);
IntWritable intWritable = new IntWritable(1);
context.write(text,intWritable);
}
}
三 PhoneReducer 類
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
public class PhoneReducer extends Reducer<Text, IntWritable,Text,IntWritable> {
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int count = 0;
for (IntWritable intWritable : values){
count += intWritable.get();
}
context.write(key, new IntWritable(count));
}
}
四 PhonePartitioner 類
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Partitioner;
public class PhonePartitioner extends Partitioner<Text, IntWritable> {
@Override
public int getPartition(Text text, IntWritable intWritable, int i) {
//136,137 138,139 其它號碼放一起
if("136".equals(text.toString().substring(0,3)) || "137".equals(text.toString().substring(0,3))){
return 0;
}else if ("138".equals(text.toString().substring(0,3)) || "139".equals(text.toString().substring(0,3))){
return 1;
}else {
return 2;
}
}
}
五 PhoneCombiner 類
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;
public class PhoneCombiner extends Reducer<Text, IntWritable,Text,IntWritable> {
@Override
protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException {
int count = 0;
for(IntWritable intWritable : values){
count += intWritable.get();
}
context.write(new Text(key.toString().substring(0,3)), new IntWritable(count));
}
}
六 PhoneDriver 類
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import java.io.IOException;
public class PhoneDriver {
public static void main(String[] args) throws IOException, ClassNotFoundException, InterruptedException {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf);
job.setJarByClass(PhoneDriver.class);
job.setMapperClass(PhoneMapper.class);
job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(IntWritable.class);
job.setCombinerClass(PhoneCombiner.class);
job.setPartitionerClass(PhonePartitioner.class);
job.setNumReduceTasks(3);
job.setReducerClass(PhoneReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
Path inPath = new Path("in/demo4/phone.csv");
FileInputFormat.setInputPaths(job, inPath);
Path outPath = new Path("out/out6");
FileSystem fs = FileSystem.get(outPath.toUri(),conf);
if (fs.exists(outPath)){
fs.delete(outPath, true);
}
FileOutputFormat.setOutputPath(job, outPath);
job.waitForCompletion(true);
}
}
七 小結
該案例新知識點在于分區(qū)(partition)和結合(combine)
這次代碼的流程是?
driver——》mapper——》partitioner——》combiner——》reducer
map 每處理一條數(shù)據(jù)都經(jīng)過一次 partitioner 分區(qū)然后存到環(huán)形緩存區(qū)中去,然后map再去處理下一條數(shù)據(jù)以此反復直至所有數(shù)據(jù)處理完成文章來源:http://www.zghlxwxcb.cn/news/detail-685798.html
combine 則是將環(huán)形緩存區(qū)溢出的緩存文件合并,并提前進行一次排序和計算(對每個溢出文件計算后再合并)最后將一個大的文件給到 reducer,這樣大大減少了 reducer 的計算負擔文章來源地址http://www.zghlxwxcb.cn/news/detail-685798.html
到了這里,關于hadoop學習:mapreduce入門案例四:partitioner 和 combiner的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關文章,希望大家以后多多支持TOY模板網(wǎng)!