国产无码综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

<samp id="bokhr"></samp>

<dl id="bokhr"></dl>

實(shí)驗(yàn)5 MapReduce初級編程實(shí)踐（1）——編程實(shí)現(xiàn)文件合并和去重操作

2年前分類：Toy博客閱讀(25)違法舉報(bào)

這篇具有很好參考價(jià)值的文章主要介紹了實(shí)驗(yàn)5 MapReduce初級編程實(shí)踐（1）——編程實(shí)現(xiàn)文件合并和去重操作。希望對大家有所幫助。如果存在錯(cuò)誤或未考慮完全的地方，請大家不吝賜教，您也可以點(diǎn)擊"舉報(bào)違法"按鈕提交疑問。

一、實(shí)驗(yàn)?zāi)康?/h2>

通過實(shí)驗(yàn)掌握基本的MapReduce編程方法；

掌握用MapReduce解決一些常見的數(shù)據(jù)處理問題，包括數(shù)據(jù)去重、數(shù)據(jù)排序和數(shù)據(jù)挖掘等。

二、實(shí)驗(yàn)平臺(tái)

操作系統(tǒng)：Linux（建議Ubuntu16.04或Ubuntu18.04）
Hadoop版本：3.1.3

三、實(shí)驗(yàn)內(nèi)容

編程實(shí)現(xiàn)文件合并和去重操作

對于兩個(gè)輸入文件，即文件A和文件B，請編寫MapReduce程序，對兩個(gè)文件進(jìn)行合并，并剔除其中重復(fù)的內(nèi)容，得到一個(gè)新的輸出文件C。下面是輸入文件和輸出文件的一個(gè)樣例供參考。

輸入文件A的樣例如下：

輸入文件B的樣例如下：

根據(jù)輸入文件A和B合并得到的輸出文件C的樣例如下：

四、實(shí)驗(yàn)步驟

進(jìn)入 Hadoop 安裝目錄，啟動(dòng) hadoop：

cd /usr/local/hadoop
sbin/start-dfs.sh

新建文件夾，創(chuàng)建文件 A、B：

sudo mkdir MapReduce && cd MapReduce
sudo vim A
sudo vim B

編寫 Java 文件實(shí)現(xiàn) MapReduce：

sudo vim Merge.java

實(shí)現(xiàn)的 Java 代碼如下：

import java.io.IOException;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.util.GenericOptionsParser;

public class Merge {
	/**
	 * @param args
	 * 對A,B兩個(gè)文件進(jìn)行合并，并剔除其中重復(fù)的內(nèi)容，得到一個(gè)新的輸出文件C
	 */
	//重載map函數(shù)，直接將輸入中的value復(fù)制到輸出數(shù)據(jù)的key上
	public static class Map extends Mapper<Object, Text, Text, Text>{
		private static Text text = new Text();
		public void map(Object key, Text value, Context context) throws IOException,InterruptedException{
			text = value;
			context.write(text, new Text(""));
		}
	}
	
	//重載reduce函數(shù)，直接將輸入中的key復(fù)制到輸出數(shù)據(jù)的key上
	public static class Reduce extends Reducer<Text, Text, Text, Text>{
		public void reduce(Text key, Iterable<Text> values, Context context ) throws IOException,InterruptedException{
			context.write(key, new Text(""));
		}
	}
	
	public static void main(String[] args) throws Exception{
		// TODO Auto-generated method stub
		Configuration conf = new Configuration();
conf.set("fs.default.name","hdfs://localhost:9000");
		String[] otherArgs = new String[]{"input","output"}; /* 直接設(shè)置輸入?yún)?shù) */
		if (otherArgs.length != 2) {
			System.err.println("Usage: wordcount <in><out>");
			System.exit(2);
			}
		Job job = Job.getInstance(conf,"Merge and duplicate removal");
		job.setJarByClass(Merge.class);
		job.setMapperClass(Map.class);
		job.setCombinerClass(Reduce.class);
		job.setReducerClass(Reduce.class);
		job.setOutputKeyClass(Text.class);
		job.setOutputValueClass(Text.class);
		FileInputFormat.addInputPath(job, new Path(otherArgs[0]));
		FileOutputFormat.setOutputPath(job, new Path(otherArgs[1]));
		System.exit(job.waitForCompletion(true) ? 0 : 1);
	}
}

賦予用戶相關(guān)權(quán)限：

sudo chown -R hadoop /usr/local/hadoop

添加編譯所需要使用的 jar 包：

vim ~/.bashrc

添加下面一行到文件的最后：

export HADOOP_HOME=/usr/local/hadoop
export CLASSPATH=$($HADOOP_HOME/bin/hadoop classpath):$CLASSPATH

使更改立即生效：

source ~/.bashrc

編譯 Merge.java：

javac Merge.java

打包生成的 class 文件為 jar 包：

jar -cvf Merge.jar *.class

創(chuàng)建 Hadoop 主目錄為 /user/hadoop 并創(chuàng)建 input 文件夾：

/usr/local/hadoop/bin/hdfs dfs -mkdir -p /user/hadoop
/usr/local/hadoop/bin/hdfs dfs -mkdir input

若 intput 已存在則刪除原有文件：

/usr/local/hadoop/bin/hdfs dfs -rm input/*

上傳 A、B 文件到 input 文件夾中：

/usr/local/hadoop/bin/hdfs dfs -put ./A input
/usr/local/hadoop/bin/hdfs dfs -put ./B input

使用之前確保 output 文件夾不存在：

/usr/local/hadoop/bin/hdfs dfs -rm -r output

使用我們剛生成的 Merge.jar 包：

/usr/local/hadoop/bin/hadoop jar Merge.jar Merge

查看輸出結(jié)果：

/usr/local/hadoop/bin/hdfs dfs -cat output/*

輸出如下：

hadoop@fzqs-Laptop:/usr/local/hadoop$ hdfs dfs -cat output/*
20170101 x	
20170101 y	
20170102 y	
20170103 x	
20170104 y	
20170104 z	
20170105 y	
20170105 z	
20170106 x
hadoop@fzqs-Laptop:/usr/local/hadoop$

此外，有想用 Python 寫的可以參考我這篇博客：實(shí)驗(yàn)5 MapReduce初級編程實(shí)踐（Python實(shí)現(xiàn)）文章來源地址http://www.zghlxwxcb.cn/news/detail-414185.html

到了這里，關(guān)于實(shí)驗(yàn)5 MapReduce初級編程實(shí)踐（1）——編程實(shí)現(xiàn)文件合并和去重操作的文章就介紹完了。如果您還想了解更多內(nèi)容，請?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！

本文來自互聯(lián)網(wǎng)用戶投稿，該文觀點(diǎn)僅代表作者本人，不代表本站立場。本站僅提供信息存儲(chǔ)空間服務(wù)，不擁有所有權(quán)，不承擔(dān)相關(guān)法律責(zé)任。如若轉(zhuǎn)載，請注明出處：如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實(shí)不符，請點(diǎn)擊違法舉報(bào)進(jìn)行投訴反饋，一經(jīng)查實(shí)，立即刪除！

分享到：

領(lǐng)支付寶紅包贊助服務(wù)器費(fèi)用

云計(jì)算與大數(shù)據(jù)入門實(shí)驗(yàn)四 —— MapReduce 初級編程實(shí)踐
通過實(shí)驗(yàn)掌握基本的 MapReduce 編程方法掌握用 MapReduce 解決一些常見的數(shù)據(jù)處理問題，包括數(shù)據(jù)去重、數(shù)據(jù)排序和數(shù)據(jù)挖掘等 (一)編程實(shí)現(xiàn)文件合并和去重操作對于兩個(gè)輸入文件，即文件A和文件B，請編寫MapReduce程序，對兩個(gè)文件進(jìn)行合并，并剔除其中重復(fù)的內(nèi)容，得到一個(gè)
2024年02月05日
瀏覽(20)
大數(shù)據(jù)技術(shù)原理及應(yīng)用課實(shí)驗(yàn)5 :MapReduce初級編程實(shí)踐
目錄一、實(shí)驗(yàn)?zāi)康?二、實(shí)驗(yàn)平臺(tái) 三、實(shí)驗(yàn)步驟（每個(gè)步驟下均需有運(yùn)行截圖）（一）編程實(shí)現(xiàn)文件合并和去重操作（二）編寫程序?qū)崿F(xiàn)對輸入文件的排序（三）對給定的表格進(jìn)行信息挖掘四、實(shí)驗(yàn)總結(jié) 五、優(yōu)化及改進(jìn)（選做）實(shí)驗(yàn)5 ?MapReduce初級編程實(shí)踐 1. 通過實(shí)驗(yàn)掌
2024年01月21日
瀏覽(29)
實(shí)驗(yàn)5 MapReduce初級編程實(shí)踐（3）——對給定的表格進(jìn)行信息挖掘
通過實(shí)驗(yàn)掌握基本的MapReduce編程方法；掌握用MapReduce解決一些常見的數(shù)據(jù)處理問題，包括數(shù)據(jù)去重、數(shù)據(jù)排序和數(shù)據(jù)挖掘等。操作系統(tǒng)：Linux（建議Ubuntu16.04或Ubuntu18.04） Hadoop版本：3.1.3 下面給出一個(gè)child-parent的表格，要求挖掘其中的父子輩關(guān)系，給出祖孫輩關(guān)系的表格。
2024年02月10日
瀏覽(25)
MapReduce初級編程實(shí)踐
ubuntu18.04虛擬機(jī)和一個(gè)win10物理主機(jī) 編程環(huán)境 IDEA 虛擬機(jī)ip：192.168.1.108 JDK：1.8 使用Java編程一個(gè)WordCount程序，并將該程序打包成Jar包在虛擬機(jī)內(nèi)執(zhí)行首先使用IDEA創(chuàng)建一個(gè)Maven項(xiàng)目在pom.xml文件內(nèi)引入依賴和打包為Jar包的插件：編寫對應(yīng)的程序： MyProgramDriver類用于執(zhí)行程序入口
2023年04月26日
瀏覽(21)
【大數(shù)據(jù)實(shí)驗(yàn)五】 MapReduce初級編程實(shí)踐
1實(shí)驗(yàn)?zāi)康?1.通過實(shí)驗(yàn)掌握基本的MapReduce編程方法； 2.掌握用MapReduce解決一些常見的數(shù)據(jù)處理問題，包括數(shù)據(jù)去重、數(shù)據(jù)排序和數(shù)據(jù)挖掘等。 2實(shí)驗(yàn)平臺(tái) 已經(jīng)配置完成的Hadoop偽分布式環(huán)境。（1）操作系統(tǒng)：Linux（Ubuntu18.04）（2）Hadoop版本：3.1.3 3實(shí)驗(yàn)內(nèi)容和要求 1.編程實(shí)現(xiàn)文件
2024年02月03日
瀏覽(156)
實(shí)驗(yàn)SparkSQL編程初級實(shí)踐
實(shí)踐環(huán)境： Oracle VM VirtualBox 6.1.12 Ubuntu 16.04 Hadoop3.1.3 JDK1.8.0_162 spark2.4.0 python3.5 Windows11系統(tǒng)下pycharm2019.1專業(yè)版實(shí)驗(yàn)?zāi)康模?通過實(shí)驗(yàn)掌握Spark SQL的基本編程方法；熟悉RDD到DataFrame的轉(zhuǎn)化方法；熟悉利用Spark SQL管理來自不同數(shù)據(jù)源的數(shù)據(jù)。實(shí)驗(yàn)內(nèi)容，步驟與實(shí)驗(yàn)結(jié)果： Spark S
2024年02月04日
瀏覽(21)
實(shí)驗(yàn)4 RDD編程初級實(shí)踐
（1）熟悉Spark的RDD基本操作及鍵值對操作；（2）熟悉使用RDD編程解決實(shí)際具體問題的方法。操作系統(tǒng)：Ubuntu16.04 Spark版本：2.1.0 實(shí)驗(yàn)內(nèi)容與完成情況： 1.spark-shell 交互式編程（1）該系總共有多少學(xué)生；（2）該系共開設(shè)
2023年04月13日
瀏覽(21)
實(shí)驗(yàn)7 Spark初級編程實(shí)踐
一、實(shí)驗(yàn)?zāi)康?掌握使用 Spark 訪問本地文件和 HDFS 文件的方法掌握 Spark 應(yīng)用程序的編寫、編譯和運(yùn)行方法二、實(shí)驗(yàn)平臺(tái) 操作系統(tǒng)：Ubuntu18.04（或 Ubuntu16.04） Spark 版本：2.4.0 Hadoop 版本：3.1.3 三、實(shí)驗(yàn)內(nèi)容和要求 1. 安裝 Hadoop 和 Spark 進(jìn)人 Linux 操作系統(tǒng)，完成 Hadoop 偽分布式模
2024年02月06日
瀏覽(21)
實(shí)驗(yàn)8 Flink初級編程實(shí)踐
由于CSDN上傳md文件總是會(huì)使圖片失效完整的實(shí)驗(yàn)文檔地址如下： https://download.csdn.net/download/qq_36428822/85814518 實(shí)驗(yàn)環(huán)境：本機(jī)：Windows 10 專業(yè)版 Intel? Core? i7-4790 CPU @ 3.60GHz 8.00 GB RAM 64 位操作系統(tǒng), 基于 x64 的處理器 Oracle VM VirtualBox 虛擬機(jī)：Linux Ubuntu 64-bit RAM 2048MB 處理器數(shù)量
2024年02月09日
瀏覽(31)
實(shí)驗(yàn)四 Spark Streaming編程初級實(shí)踐
數(shù)據(jù)流? ：數(shù)據(jù)流通常被視為一個(gè)隨時(shí)間延續(xù)而無限增長的動(dòng)態(tài)數(shù)據(jù)集合，是一組順序、大量、快速、連續(xù)到達(dá)的數(shù)據(jù)序列。通過對流數(shù)據(jù)處理，可以進(jìn)行衛(wèi)星云圖監(jiān)測、股市走向分析、網(wǎng)絡(luò)攻擊判斷、傳感器實(shí)時(shí)信號分析。 1.下載安裝包 https://www.apache.org/dyn/closer.lua/flume/
2024年04月26日
瀏覽(33)