一、報(bào)錯(cuò)信息
核心報(bào)錯(cuò)信息 :
- WARN Shell: Did not find winutils.exe: java.io.FileNotFoundException:
- java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.
在 PyCharm 中 , 調(diào)用 PySpark 執(zhí)行 計(jì)算任務(wù) , 會(huì)報(bào)如下錯(cuò)誤 :
D:\001_Develop\022_Python\Python39\python.exe D:/002_Project/011_Python/HelloPython/Client.py
23/08/01 11:25:24 WARN Shell: Did not find winutils.exe: java.io.FileNotFoundException: java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset. -see https://wiki.apache.org/hadoop/WindowsProblems
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
23/08/01 11:25:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
PySpark 版本號(hào) : 3.4.1
查看文件內(nèi)容 : ['Tom Jerry', 'Tom Jerry Tom', 'Jack Jerry']
查看文件內(nèi)容展平效果 : ['Tom', 'Jerry', 'Tom', 'Jerry', 'Tom', 'Jack', 'Jerry']
轉(zhuǎn)為二元元組效果 : [('Tom', 1), ('Jerry', 1), ('Tom', 1), ('Jerry', 1), ('Tom', 1), ('Jack', 1), ('Jerry', 1)]
D:\001_Develop\022_Python\Python39\Lib\site-packages\pyspark\python\lib\pyspark.zip\pyspark\shuffle.py:65: UserWarning: Please install psutil to have better support with spilling
D:\001_Develop\022_Python\Python39\Lib\site-packages\pyspark\python\lib\pyspark.zip\pyspark\shuffle.py:65: UserWarning: Please install psutil to have better support with spilling
D:\001_Develop\022_Python\Python39\Lib\site-packages\pyspark\python\lib\pyspark.zip\pyspark\shuffle.py:65: UserWarning: Please install psutil to have better support with spilling
D:\001_Develop\022_Python\Python39\Lib\site-packages\pyspark\python\lib\pyspark.zip\pyspark\shuffle.py:65: UserWarning: Please install psutil to have better support with spilling
最終統(tǒng)計(jì)單詞 : [('Tom', 3), ('Jack', 1), ('Jerry', 3)]
Process finished with exit code 0
二、解決方案 ( 安裝 Hadoop 運(yùn)行環(huán)境 )
核心報(bào)錯(cuò)信息 :
- WARN Shell: Did not find winutils.exe: java.io.FileNotFoundException:
- java.io.FileNotFoundException: HADOOP_HOME and hadoop.home.dir are unset.
PySpark 一般會(huì)與 Hadoop 環(huán)境一起運(yùn)行 , 如果在 Windows 中沒有安裝 Hadoop 運(yùn)行環(huán)境 , 就會(huì)報(bào)上述錯(cuò)誤 ;
Hadoop 發(fā)布版本在 https://hadoop.apache.org/releases.html 頁(yè)面可下載 ;
當(dāng)前最新版本是 3.3.6 , 點(diǎn)擊 Binary download 下的 binary (checksum signature) 鏈接 ,
進(jìn)入到 Hadoop 3.3.6 下載頁(yè)面 :
下載地址為 :
https://dlcdn.apache.org/hadoop/common/hadoop-3.3.6/hadoop-3.3.6.tar.gz
官方下載速度很慢 ;
這里提供一個(gè) Hadoop 版本 , Hadoop 3.3.4 + winutils , CSDN 0 積分下載地址 :
下載完后 , 解壓 Hadoop , 安裝路徑為 D:\001_Develop\052_Hadoop\hadoop-3.3.4\hadoop-3.3.4 ;
在 環(huán)境變量 中 , 設(shè)置
HADOOP_HOME = D:\001_Develop\052_Hadoop\hadoop-3.3.4\hadoop-3.3.4
系統(tǒng) 環(huán)境變量 ;
在 Path 環(huán)境變量中 , 增加
%HADOOP_HOME%\bin
%HADOOP_HOME%\sbin
環(huán)境變量 ;
設(shè)置 D:\001_Develop\052_Hadoop\hadoop-3.3.4\hadoop-3.3.4\etc\hadoop\hadoop-env.cmd 腳本中的 JAVA_HOME 為真實(shí)的 JDK 路徑 ;
將
set JAVA_HOME=%JAVA_HOME%
修改為
set JAVA_HOME=C:\Program Files\Java\jdk1.8.0_91
將 winutils-master\hadoop-3.3.0\bin 中的 hadoop.dll 和 winutils.exe 文件拷貝到 C:\Windows\System32 目錄中 ;
重啟電腦 , 一定要重啟 ;
然后在命令行中 , 執(zhí)行文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-721733.html
hadoop -version
驗(yàn)證 Hadoop 是否安裝完成 ;文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-721733.html
到了這里,關(guān)于【錯(cuò)誤記錄】PySpark 運(yùn)行報(bào)錯(cuò) ( Did not find winutils.exe | HADOOP_HOME and hadoop.home.dir are unset )的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!