第1關:數(shù)據(jù)清洗--過濾字段長度不足的且將出生日期轉(zhuǎn):
package?com.yy
?
import?org.apache.spark.rdd.RDD
import?org.apache.spark.sql.{DataFrame,?Dataset,?SparkSession}
object?edu{
????/**********Begin**********/
????//?此處可填寫相關代碼
????case?class?Person(id:String,Name:String,CtfTp:String,CtfId:String,Gender:String,Birthday:String,Address:String,Zip:String,Duty:String,Mobile:String,Tel:String,Fax:String,EMail:String,Nation:String,Taste:String,Education:String,Company:String,Family:String,Version:String,Hotel:String,Grade:String,Duration:String,City:String)
????/**********End**********/
????def?main(args:?Array[String]):?Unit?=?{
????????val?spark?=?SparkSession
????????.builder()
????????.appName("Spark?SQL")
????????.master("local")
????????.config("spark.some.config.option",?"some-value")
????????.getOrCreate()
????????val?rdd?=?spark.sparkContext.textFile("file:///root/files/part-00000-4ead9570-10e5-44dc-80ad-860cb072a9ff-c000.csv")
????????/**********Begin**********/
????????//?清洗臟數(shù)據(jù)(字段長度不足?23?的數(shù)據(jù)視為臟數(shù)據(jù))
????????val?rdd1:?RDD[String]?=?rdd.filter(x=>{
????????val?e=x.split(",",-1)文章來源:http://www.zghlxwxcb.cn/news/detail-490340.html
??文章來源地址http://www.zghlxwxcb.cn/news/detail-490340.html
到了這里,關于企業(yè)Spark案例--酒店數(shù)據(jù)分析實戰(zhàn)提交的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關文章,希望大家以后多多支持TOY模板網(wǎng)!