創(chuàng)作初衷:由于在這上面翻過太多的爛文章(博主自己都沒搞懂就“寫作抄襲”),才寫下此文(已從重裝系統(tǒng)做過3次測(cè)試,沒有問題才下筆),文章屬于保姆級(jí)別。
~~~~~~~~~~~~~~~~~~~~~~~~~創(chuàng)作不易,轉(zhuǎn)載請(qǐng)說明~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
本文相關(guān)的版本信息(沒部署Hadoop,本文環(huán)境基于Linux的,且文件和程序全是root用戶組)
操作系統(tǒng):Centos 7.4
sbt 打包插件:1.7.1? ? ? 官鏈:sbt - The interactive build tool
spark版本:3.3.0? ? ? ? ? 官鏈:Index of /dist/spark
JDK版本:1.8? ? ? ? ? ? ? ? 略
scala版本: 2.12.15? ? ? 官鏈:All Available Versions | The Scala Programming Language
先把包傳/opt上去:
一、spark和scala基礎(chǔ)安裝與環(huán)境配置
a、scala安裝
解壓scala包到software,授權(quán)給root用戶組,配置環(huán)境變量
[root@spark01 opt]# tar -zxvf scala-2.12.15.tgz -C /software/
[root@spark01 opt]# cd /software/
[root@spark01 software]# ll
總用量 0
drwxrwxr-x. 6 2000 2000 ?79 9月 ?15 2021 scala-2.12.15
[root@spark01 software]# chown -R root.root /software/[root@spark01 software]# vim /etc/profile? ? ?# 把scala安裝路徑加進(jìn)去
SCALA_HOME=/software/scala-2.12.15
PATH=$PATH:$SCALA_HOME/bin[root@spark01 software]# source /etc/profile
[root@spark01 software]# scala -version
Scala code runner version 2.12.15 -- Copyright 2002-2021, LAMP/EPFL and Lightbend, Inc.
[root@spark01 software]# scala
Welcome to Scala 2.12.15 (OpenJDK 64-Bit Server VM, Java 1.8.0_131).
Type in expressions for evaluation. Or try :help.scala> :quit
?
b、spark安裝
本處提出一個(gè)疑問,spark的scala版本問題,我翻了很多篇文章,發(fā)現(xiàn)有的配置了spark的env文件,配置的是他自己安裝的scala版本,但是spark-shell啟動(dòng)還是用的默認(rèn)版本,該處沒整明白,所有本處暫不做spark詳細(xì)配置。
解壓spark包到software,授權(quán)給root用戶組,改名spark-3.3.0,配置環(huán)境變量
[root@spark01 opt]# tar -xvf spark-3.3.0-bin-hadoop3-scala2.13.tgz -C /software/
[root@spark01 software]# chown -R root.root /software/[root@spark01 software]# mv spark-3.3.0-bin-hadoop3-scala2.13/ spark-3.3.0
[root@spark01 software]# vim /etc/profile? ? ? ? ?# 把spark的安裝目錄加進(jìn)去
SPARK_HOME=/software/spark-3.3.0
PATH=$PATH:$SCALA_HOME/bin:$SPARK_HOME/bin[root@spark01 software]# source /etc/profile
環(huán)境配置沒有問題的話,可任何路徑啟動(dòng)spark
[root@spark01 software]# spark-shell?
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
Welcome to
? ? ? ____ ? ? ? ? ? ? ?__
? ? ?/ __/__ ?___ _____/ /__
? ? _\ \/ _ \/ _ `/ __/ ?'_/
? ?/___/ .__/\_,_/_/ /_/\_\ ? version 3.3.0
? ? ? /_/
? ? ? ? ?
Using Scala version 2.13.8 (OpenJDK 64-Bit Server VM, Java 1.8.0_131)
Type in expressions to have them evaluated.
Type :help for more information.
22/09/19 00:56:10 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Spark context Web UI available at http://spark01:4040
Spark context available as 'sc' (master = local[*], app id = local-1663520171943).
Spark session available as 'spark'.scala>
二、SBT安裝與環(huán)境配置
把包下載上傳到opt之后解壓到software并改名sbt1.7.1,本文所有執(zhí)行程序權(quán)限是root用戶
[root@spark01 opt]# mkdir /software ; tar -xf sbt-1.7.1.tgz -C /software ; mv /software/sbt/ /software/sbt1.7.1
[root@spark01 opt]# cd /software/
[root@spark01 software]# chown -R root.root sbt1.7.1/
[root@spark01 software]# ll
總用量 0
drwxr-xr-x. 4 root root 58 7月 ?12 11:49 sbt1.7.1
?復(fù)制sbt1.7.1/bin目錄下的sbt-launch.jar到上級(jí)目錄,然后vim一個(gè)sbt腳本加載基礎(chǔ)依賴(./sbt sbtVersion 筆者這里執(zhí)行費(fèi)時(shí)大約5分鐘,首次需耐心等待)。并授予執(zhí)行權(quán)限
[root@spark01 software]# cd sbt1.7.1/
[root@spark01 sbt1.7.1]# cp bin/sbt-launch.jar ./
[root@spark01 sbt1.7.1]# vim sbt
[root@spark01 sbt1.7.1]# cat sbt
#!/bin/bash
SBT_OPTS="-Xms512M -Xmx1536M -Xss1M -XX:+CMSClassUnloadingEnabled -XX:MaxPermSize=256M"
java $SBT_OPTS -jar `dirname $0`/sbt-launch.jar "$@"
[root@spark01 sbt1.7.1]# chmod +x sbt
[root@spark01 sbt1.7.1]# ./sbt sbtVersion
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=256M; support was removed in 8.0
[info] [launcher] getting org.scala-sbt sbt 1.7.1 ?(this may take some time)...
[info] [launcher] getting Scala 2.12.16 (for sbt)...
[warn] No sbt.version set in project/build.properties, base directory: /software/sbt1.7.1
[info] welcome to sbt 1.7.1 (Oracle Corporation Java 1.8.0_131)
[info] set current project to sbt1-7-1 (in build file:/software/sbt1.7.1/)
[info] 1.7.1
??
配置sbt的環(huán)境變量,因?yàn)槟銊?chuàng)建的sbt在/software/sbt1.7.1 ,/software/sbt1.7.1/bin這個(gè)下面的是官方的
[root@spark01 sbt1.7.1]# vim /etc/profile
SBT_HOME=/software/
PATH=$PATH:$SCALA_HOME/bin:$SPARK_HOME/bin:$SBT_HOME/sbt1.7.1[root@spark01 sbt1.7.1]# source /etc/profile
[root@spark01 sbt1.7.1]# sbt? ? ?
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=256M; support was removed in 8.0
[warn] No sbt.version set in project/build.properties, base directory: /software/sbt1.7.1
[info] welcome to sbt 1.7.1 (Oracle Corporation Java 1.8.0_131)
[info] set current project to sbt1-7-1 (in build file:/software/sbt1.7.1/)
[info] sbt server started at local:///root/.sbt/1.0/server/ffba2d1aa13a1e5b3cdb/sock
[info] started sbt server
sbt:sbt1-7-1>?
[info] shutting down sbt server安裝完成之后的software目錄,文件和權(quán)限應(yīng)如下所示,父目錄及子目錄全為root
?三、經(jīng)典案例——HelloWorld
我這里的路徑是/demo,demo下面放spark應(yīng)用程序
[root@spark01 /]# mkdir /demo
[root@spark01 demo]# mkdir SparkFristApp
[root@spark01 demo]# cd SparkFristApp/
[root@spark01 SparkFristApp]# mkdir -p src/main/scala/com
[root@spark01 SparkFristApp]# cd src/main/scala/com/[root@spark01 com]# vim HelloWorld.scala
[root@spark01 com]# cat HelloWorld.scala?
package main.scala.comobject HelloWorld {
?? ?def main(args:Array[String]) :Unit = {
?? ??? ?println("HelloWorld !!!")
?? ?}
}
上述路徑圖解:對(duì)應(yīng)idea的maven
基礎(chǔ)代碼已經(jīng)寫好了,現(xiàn)在編寫build.sbt打包,注意這個(gè)地方需要到你的應(yīng)用程序根目錄下,也就是你的SparkFristApp,然后需要編寫你這個(gè)應(yīng)用需要的依賴庫,這個(gè)簡(jiǎn)單輸出HelloWorld,是看不出問題的,后續(xù)有案例詳解
注意:這個(gè)是Linux環(huán)境,build.sbt文件不要CV,不要CV,不要CV,老老實(shí)實(shí)手敲。
[root@spark01 com]# cd /demo/SparkFristApp/
[root@spark01 SparkFristApp]# vim build.sbt
[root@spark01 SparkFristApp]# cat build.sbt?
name := "sparkfirstapp"version := "1.0"
scalaVersion := "2.13.8"
libraryDependencies ++= Seq(
? "org.apache.spark" %% "spark-core" % "3.3.0",
? "org.apache.spark" %% "spark-sql" % "3.3.0"
)
?build.sbt文件圖解
開始打包,注意文件子目錄應(yīng)如下,打包需在SparkFristApp,這里沒加載依賴,所有打包很快,花費(fèi)了7 S ,還有一個(gè)原因:不確定是不是sbt1.7.1版本優(yōu)化了這個(gè)問題不。
[root@spark01 SparkFristApp]# tree src/
src/
└── main
? ? └── scala
? ? ? ? └── com
? ? ? ? ? ? └── HelloWorld.scala3 directories, 1 file
[root@spark01 SparkFristApp]# sbt package
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=256M; support was removed in 8.0
[info] Updated file /demo/SparkFristApp/project/build.properties: set sbt.version to 1.7.1
[info] welcome to sbt 1.7.1 (Oracle Corporation Java 1.8.0_131)
[info] loading project definition from /demo/SparkFristApp/project
[info] loading settings for project sparkfristapp from build.sbt ...
[info] set current project to sparkfirstapp (in build file:/demo/SparkFristApp/)
[info] compiling 1 Scala source to /demo/SparkFristApp/target/scala-2.13/classes ...
[success] Total time: 7 s, completed 2022-9-19 4:03:55
?打包完成之后,就可丟spark里跑jar包了
[root@spark01 SparkFristApp]# spark-submit --class main.scala.com.HelloWorld ./target/scala-2.13/sparkfirstapp_2.13-1.0.jar?
HelloWorld !!!
22/09/19 04:05:04 INFO ShutdownHookManager: Shutdown hook called
22/09/19 04:05:04 INFO ShutdownHookManager: Deleting directory /tmp/spark-69de37d4-1065-4c71-a6fe-2f0949cbb373
四、官方spark3.3.0獨(dú)立應(yīng)用程序
helloworld已經(jīng)跑完了,現(xiàn)在來試試官方的簡(jiǎn)單案例吧,該案例的意思就是計(jì)算含有a和b的行數(shù)
官鏈地址:Quick Start - Spark 3.3.0 Documentation?
開始建vim scala代碼,把官方的案例CV加一下包路徑,還有你環(huán)境README.md文件路徑
[root@spark01 SparkFristApp]# cd src/main/scala/com/
[root@spark01 com]# vim SimpleApp.scala[root@spark01 com]# cat SimpleApp.scala?
/* SimpleApp.scala */package main.scala.com
import org.apache.spark.sql.SparkSession
object SimpleApp {
? def main(args: Array[String]) {
? ? val logFile = "/software/spark-3.3.0/README.md" // Should be some file on your system
? ? val spark = SparkSession.builder.appName("Simple Application").getOrCreate()
? ? val logData = spark.read.textFile(logFile).cache()
? ? val numAs = logData.filter(line => line.contains("a")).count()
? ? val numBs = logData.filter(line => line.contains("b")).count()
? ? println(s"Lines with a: $numAs, Lines with b: $numBs")
? ? spark.stop()
? }
}
?然后開始sbt打包,由于你每個(gè)代碼塊用的依賴包都不一定是一致的,所以打包時(shí)創(chuàng)建的build.sbt依賴庫也是不一致的,上面的build.sbt依賴庫在這里是可以用的,所以不需要重新編寫build.sbt文件,該處注意上下文紫色文字
[root@spark01 com]# cd -
/demo/SparkFristApp
[root@spark01 SparkFristApp]# cat sbt?
name := "sparkfirstapp"
version := "1.0"
scalaVersion := "2.13.8"
libraryDependencies ++= Seq(
??"org.apache.spark" %% "spark-core" % "3.3.0",
??"org.apache.spark" %% "spark-sql" ?% "3.3.0"
)
執(zhí)行sbt打包,需要配置一下源,這樣能加快下載包的速,編輯~/.sbt/repositories的文件,換成阿里的,這個(gè)庫源很玄學(xué),有時(shí)候能下,有時(shí)候不能下,我遇到的問題記錄最后有說明。
[root@spark01 SparkFristApp]# vim ?~/.sbt/repositories?
[root@spark01 SparkFristApp]# cat ? ~/.sbt/repositories?
[repositories]
? local
? maven-central: https://maven.aliyun.com/repository/central
然后就是sbt打包
[root@spark01 SparkFristApp]# ll
總用量 4
-rw-r--r--. 1 root root 194 9月 ?19 03:51 build.sbt
drwxr-xr-x. 3 root root ?44 9月 ?19 04:03 project
drwxr-xr-x. 3 root root ?18 9月 ?19 01:21 src
drwxr-xr-x. 6 root root ?88 9月 ?19 04:03 target[root@spark01 SparkFristApp]# sbt package
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=256M; support was removed in 8.0
[info] welcome to sbt 1.7.1 (Oracle Corporation Java 1.8.0_131)
[info] loading project definition from /demo/SparkFristApp/project
[info] loading settings for project sparkfristapp from build.sbt ...
[info] set current project to sparkfirstapp (in build file:/demo/SparkFristApp/)
[info] compiling 2 Scala sources to /demo/SparkFristApp/target/scala-2.13/classes ...
[warn] 1 deprecation (since 2.13.0)
[warn] 1 deprecation (since 2.13.3)
[warn] 2 deprecations in total; re-run with -deprecation for details
[warn] three warnings found
[warn] multiple main classes detected: run 'show discoveredMainClasses' to see the list
[success] Total time: 8 s, completed 2022-9-19 4:13:08上面標(biāo)紅這是最后一行,表明打包時(shí)間與完成時(shí)間
下面是當(dāng)前目錄結(jié)構(gòu)
[root@spark01 SparkFristApp]# ll
總用量 4
-rw-r--r--. 1 root root 137 9月 ?19 03:19 build.sbt
drwxr-xr-x. 3 root root ?44 9月 ?19 03:19 project
drwxr-xr-x. 3 root root ?18 9月 ?19 01:21 src
drwxr-xr-x. 6 root root ?88 9月 ?19 03:19 target
現(xiàn)在運(yùn)行官方的這個(gè)案例,這里日志太多了,只貼前文截圖與后文截圖
注:具體INFO與ERR輸出需要配置log4j,本文暫不做講解
[root@spark01 SparkFristApp]# spark-submit --class main.scala.com.SimpleApp ./target/scala-2.13/sparkfirstapp_2.13-1.0.jar?
五、實(shí)戰(zhàn)案例
[root@spark01 SparkFristApp]# cd src/main/scala/com/
[root@spark01 com]# cat MnMcount.scala
package main.scala.com import org.apache.spark.sql.SparkSession import org.apache.spark.sql.functions._ /** * Usage: MnMcount <mnm_file_dataset> */ object MnMcount { def main(args: Array[String]) { val spark = SparkSession .builder .appName("MnMCount") .getOrCreate() if (args.length < 1) { print("Usage: MnMcount <mnm_file_dataset>") sys.exit(1) } // 讀取文件名 val mnmFile = args(0) // 將數(shù)據(jù)讀到 Spark DataFrame val mnmDF = spark.read.format("csv") .option("header", "true") .option("inferSchema", "true") .load(mnmFile) mnmDF.show(5, false) // 通過 State Color分組聚合求出所有顏色總計(jì)數(shù),然后降序排列 val countMnMDF = mnmDF.select("State", "Color", "Count") .groupBy("State", "Color") .sum("Count") .orderBy(desc("sum(Count)")) // 展示State Color聚合對(duì)應(yīng)的結(jié)果 countMnMDF.show(60) println(s"Total Rows = ${countMnMDF.count()}") println() // 通過過濾得到聚合數(shù)據(jù) val caCountMnNDF = mnmDF.select("*") .where(col("State") === "CA") .groupBy("State", "Color") .sum("Count") .orderBy(desc("sum(Count)")) // 展示聚合結(jié)果 caCountMnNDF.show(10) } }
代碼擼完就該打包上路了
[root@spark01 com]# cd -
/demo/SparkFristApp
[root@spark01 SparkFristApp]# sbt package
OpenJDK 64-Bit Server VM warning: ignoring option MaxPermSize=256M; support was removed in 8.0
[info] welcome to sbt 1.7.1 (Oracle Corporation Java 1.8.0_131)
[info] loading project definition from /demo/SparkFristApp/project
[info] loading settings for project sparkfristapp from build.sbt ...
[info] set current project to sparkfirstapp (in build file:/demo/SparkFristApp/)
[info] compiling 1 Scala source to /demo/SparkFristApp/target/scala-2.13/classes ...
[warn] 1 deprecation (since 2.13.0)
[warn] 1 deprecation (since 2.13.3)
[warn] 2 deprecations in total; re-run with -deprecation for details
[warn] three warnings found
[warn] multiple main classes detected: run 'show discoveredMainClasses' to see the list
[success] Total time: 5 s, completed 2022-9-19 4:34:26
?打包安全,現(xiàn)在把數(shù)據(jù)文件扔到/data (mnm_dataset.csv是一個(gè)測(cè)試數(shù)據(jù)文件,放在/data下面)
?[root@spark01 com]# mkdir /data
[root@spark01 SparkFristApp]# more /data/mnm_dataset.csv?
State,Color,Count
TX,Red,20
NV,Blue,66
CO,Blue,79
OR,Blue,71
WA,Yellow,93
WY,Blue,16
CA,Yellow,53
WA,Green,60
OR,Green,71
TX,Green,68
NV,Green,59
AZ,Brown,95
WA,Yellow,20
AZ,Blue,75
OR,Brown,72
NV,Red,98
WY,Orange,45
CO,Blue,52
TX,Brown,94
CO,Red,82
CO,Red,12
CO,Red,17
OR,Green,16
AZ,Green,46
NV,Red,43
NM,Yellow,15
WA,Red,12
OR,Green,13
CO,Blue,95
WY,Red,63
TX,Orange,63
WY,Yellow,48
OR,Green,95
WA,Red,75
CO,Orange,93
NV,Orange,10
WY,Green,15
WA,Green,99
CO,Blue,98
CA,Green,86
UT,Red,92
......................[root@spark01 data]# wc -l mnm_dataset.csv?
100000 mnm_dataset.csv
程序執(zhí)行(帶部分截圖)
[root@spark01 SparkFristApp]# spark-submit --class main.scala.com.MnMcount ./target/scala-2.13/ /data/mnm_dataset.csv
?
至此,這個(gè)sparkfirstapp_2.13-1.0.jar不僅能跑入門的HelloWorld,還能跑入入門的SimpleApp,還能跑入入入門的MnMcount。
六、問題記錄——玄學(xué)
?
原因一:sbt寫的有問題,如版本、換行(Linux跟Windows換行不一樣的,在Linux vim sbt 建議一個(gè)個(gè)字母敲,不要cv,不然你就會(huì)很玄學(xué)的,可以自己再測(cè)測(cè),hhhh~)
原因二:那個(gè)lj庫源有問題,建議去看看官方:Central Repository: org/apache/spark
這里推薦用阿里的central(上面的~/.sbt/repositories也是配置的這個(gè)):https://maven.aliyun.com/repository/central
阿里倉庫:倉庫服務(wù)
當(dāng)然了,哪個(gè)用著舒服你就用哪個(gè)
原因三:你代碼有問題,這個(gè)測(cè)試最容易,就是直接cv到spark-shell里面去,導(dǎo)不進(jìn)去會(huì)直接報(bào)紅
至此,你的程序打包還有問題嗎?可以在spark上跑起來了吧?祝好運(yùn)沒ERR。
“火花”的學(xué)習(xí)之路才剛剛開始,加油!
? ? ? ? ? ? ? ? ?~~~~~~~~~~~~~~~~~~~創(chuàng)作不易,轉(zhuǎn)載請(qǐng)說明~~~~~~~~~~~~~~~~~~文章來源:http://www.zghlxwxcb.cn/news/detail-427678.html
~~~~~~~~~~~~~~~~~~~~~~~~~~用心寫好每一篇技術(shù)文章~~~~~~~~~~~~~~~~~~~~~~~~~~~文章來源地址http://www.zghlxwxcb.cn/news/detail-427678.html
到了這里,關(guān)于spark入門案例以及sbt安裝與打包(Linux環(huán)境)的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!