The Spark SQL CLI is a convenient interactive command tool to run the Hive metastore service and execute SQL queries input from the command line. Note that the Spark SQL CLI cannot talk to the Thrift JDBC server.
spark SQL Cli 用于運(yùn)行hive metastore服務(wù)和通過(guò)命令行執(zhí)行sql查詢。注意,Spark SQL CLI不能和Thrift JDBC server進(jìn)行通訊。
?
一. Spark SQL Command Line Options(命令行參數(shù))
執(zhí)行./bin/spark-sql --help
獲取所有的執(zhí)行參數(shù)
CLI options:
-d,--define <key=value> 用于HIVE命令的變量替換 比如 -d A=B 或者 --define A=B
--database <databasename> 指定使用哪個(gè)數(shù)據(jù)庫(kù)
-e <quoted-query-string> 執(zhí)行sql語(yǔ)句,可以以;號(hào)分割的多個(gè)sql,比如:spark-sql -e "show databases; select 1,'a'"
-f <filename> 執(zhí)行sql文件,文件中可以存在以;號(hào)分割的多個(gè)sql,比如:spark-sql -f spark-test.sql
-H,--help Print help information
--hiveconf <property=value> Use value for given property
--hivevar <key=value> Variable substitution to apply to Hive commands. e.g. --hivevar A=B
-i <filename> Initialization SQL file:初始化sql文件
-S,--silent Silent mode in interactive shell
-v,--verbose Verbose mode (echo executed SQL to the console)
?
二. The hiverc File
1. without the -i
When invoked without the -i, the Spark SQL CLI will attempt to load $HIVE_HOME/bin/.hiverc and $HOME/.hiverc as initialization files.
在沒(méi)有使用 -i 選項(xiàng)的情況下調(diào)用 Spark SQL CLI 時(shí),它將嘗試加載 $HIVE_HOME/bin/.hiverc 和 $HOME/.hiverc 作為初始化文件。
這里介紹下.hiverc
2. .hiverc 介紹
在Apache Hive中,.hiverc 文件是一個(gè)用于配置Hive客戶端的啟動(dòng)腳本。它允許你在啟動(dòng)Hive命令行客戶端(hive)時(shí)執(zhí)行一些自定義操作,如設(shè)置環(huán)境變量、加載自定義函數(shù)、定義別名等。
.hiverc 文件通常位于Hive的安裝目錄下的 bin 子目錄。它是一個(gè)可執(zhí)行的腳本文件,當(dāng)你啟動(dòng)Hive客戶端時(shí),會(huì)自動(dòng)執(zhí)行其中的命令。
以下是一些你可以在 .hiverc 文件中執(zhí)行的操作:
- 設(shè)置環(huán)境變量:你可以在 .hiverc 文件中設(shè)置環(huán)境變量,以便在Hive會(huì)話期間使用。例如,你可以設(shè)置 HADOOP_HOME 或其他環(huán)境變量。
- 加載自定義函數(shù):如果你有自定義的Hive函數(shù),你可以在 .hiverc 文件中使用 ADD JAR 命令加載這些函數(shù)。
- 定義別名:你可以定義Hive命令的別名,以簡(jiǎn)化常用命令的輸入。例如,你可以定義一個(gè)別名來(lái)運(yùn)行一組復(fù)雜的查詢。
- 其他初始化操作:你還可以執(zhí)行其他自定義的初始化操作,以適應(yīng)特定的需求和工作流程。
如下示例:
#!/bin/bash
# 設(shè)置環(huán)境變量
export HADOOP_HOME=/path/to/hadoop
export HIVE_HOME=/path/to/hive
# 加載自定義函數(shù)
ADD JAR /path/to/custom_functions.jar;
# 定義別名
CREATE ALIAS myquery AS 'SELECT * FROM mytable WHERE condition';
注意,.hiverc 文件中的命令將在每次啟動(dòng)Hive客戶端時(shí)執(zhí)行,因此你可以將其中的常用操作自動(dòng)化,以提高工作效率。
?
三. 支持的路徑協(xié)議
Spark SQL CLI supports running SQL from initialization script file(-i) or normal SQL file(-f), If path url don’t have a scheme component, the path will be handled as local file. For example: /path/to/spark-sql-cli.sql equals to file:///path/to/spark-sql-cli.sql.
User also can use Hadoop supported filesystems such as s3://[mys3bucket]/path/to/spark-sql-cli.sql or hdfs://[namenode]:[port]/path/to/spark-sql-cli.sql.
Spark SQL CLI 支持通過(guò)
- (-i)初始化腳本
- (-f)普通SQL文件
兩種方式執(zhí)行sql,如果路徑?jīng)]有協(xié)議,則被處理為本地文件。
用戶也可以使用hadoop支持的s3和hdfs協(xié)議的路徑,比如說(shuō):s3://<mys3bucket>/path/to/spark-sql-cli.sql、 hdfs://<namenode>:<port>/path/to/spark-sql-cli.sql.
?
四. 支持的注釋類型
?
五. Spark SQL CLI交互式命令
當(dāng)spark-sql沒(méi)有使用-e 或 -f (sql字符和sql文件)時(shí),命令將進(jìn)入交互模式,使用;號(hào)終止命令。
注意:
- 該CLI僅在分號(hào) ; 出現(xiàn)在行尾且沒(méi)有被反斜杠 \; 轉(zhuǎn)義時(shí),才將其用于終止命令。
- 分號(hào) ; 是唯一用于終止命令的方式。如果用戶輸入 SELECT 1 并按回車鍵,控制臺(tái)將等待進(jìn)一步輸入。
- 如果用戶在同一行中輸入多個(gè)命令,例如 SELECT 1; SELECT 2;,則命令 SELECT 1 和 SELECT 2 將分別被執(zhí)行。
- 如果分號(hào) ; 出現(xiàn)在SQL語(yǔ)句中(而不是行尾),則它沒(méi)有特殊含義:繼續(xù)解釋 SQL 語(yǔ)句的執(zhí)行
再看一個(gè)例子:
/* This is a comment contains ;
*/ SELECT 1;
如果分號(hào) ; 出現(xiàn)在行尾,它將終止SQL語(yǔ)句。上面的例子將被切分為:/* This is a comment contains
和*/ SELECT 1
兩個(gè)sql執(zhí)行,此時(shí)執(zhí)行就會(huì)報(bào)錯(cuò)。
?
相關(guān)交互命令
?
六. Examples
1. running a query from the command line
./bin/spark-sql -e 'SELECT COL FROM TBL'
2. setting Hive configuration variables
./bin/spark-sql -e 'SELECT COL FROM TBL' --hiveconf hive.exec.scratchdir=/home/my/hive_scratch
3. setting Hive configuration variables and using it in the SQL query
./bin/spark-sql -e 'SELECT ${hiveconf:aaa}' --hiveconf aaa=bbb --hiveconf hive.exec.scratchdir=/home/my/hive_scratch
spark-sql> SELECT ${aaa};
bbb
4. setting Hive variables substitution
./bin/spark-sql --hivevar aaa=bbb --define ccc=ddd
spark-sql> SELECT ${aaa}, ${ccc};
bbb ddd
5. dumping data out from a query into a file using silent mode
./bin/spark-sql -S -e 'SELECT COL FROM TBL' > result.txt
6. running a script non-interactively:
[user_excute@poc11v ~/clients]$ spark-sql -f spark-test.sql
Java HotSpot(TM) 64-Bit Server VM warning: Using the ParNew young collector with the Serial old collector is deprecated and will likely be removed in a future release
Warning: Master yarn-client is deprecated since 2.0. Please use master "yarn" with specified deploy mode instead.
23/10/26 15:02:23 WARN SparkConf: The configuration key 'spark.scheduler.executorTaskBlacklistTime' has been deprecated as of Spark 2.1.0 and may be removed in the future. Please use the new blacklisting options, spark.blacklist.*
23/10/26 15:02:23 WARN SparkConf: The configuration key 'spark.akka.frameSize' has been deprecated as of Spark 1.6 and may be removed in the future. Please use the new key 'spark.rpc.message.maxSize' instead.
23/10/26 15:02:24 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
23/10/26 15:02:30 INFO Client: Requesting a new application from cluster with 1 NodeManagers
23/10/26 15:02:30 INFO Client: Verifying our application has not requested more than the maximum memory capability of the cluster (8192 MB per container)
23/10/26 15:02:30 INFO Client: Will allocate AM container, with 2432 MB memory including 384 MB overhead
23/10/26 15:02:30 INFO Client: Setting up container launch context for our AM
23/10/26 15:02:30 INFO Client: Setting up the launch environment for our AM container
23/10/26 15:02:30 INFO Client: Preparing resources for our AM container
23/10/26 15:02:31 WARN Client: Neither spark.yarn.jars nor spark.yarn.archive is set, falling back to uploading libraries under SPARK_HOME.
23/10/26 15:02:35 INFO Client: Uploading resource file:/tmp/spark-abdc8f1b-5f01-42c5-820e-9213f4895e69/__spark_libs__927845140246214662.zip -> hdfs://xmanhdfs3/home/user_excute/spark/cache/.sparkStaging/application_1698291202798_0002/__spark_libs__927845140246214662.zip
23/10/26 15:02:38 INFO Client: Uploading resource file:/tmp/spark-abdc8f1b-5f01-42c5-820e-9213f4895e69/__spark_conf__3928501941049491781.zip -> hdfs://xmanhdfs3/home/user_excute/spark/cache/.sparkStaging/application_1698291202798_0002/__spark_conf__.zip
23/10/26 15:02:40 INFO Client: Submitting application application_1698291202798_0002 to ResourceManager
23/10/26 15:02:41 INFO Client: Application report for application_1698291202798_0002 (state: ACCEPTED)
23/10/26 15:02:41 INFO Client:
client token: N/A
diagnostics: AM container is launched, waiting for AM container to Register with RM
ApplicationMaster host: N/A
ApplicationMaster RPC port: -1
queue: root.user_excute
start time: 1698303760575
final status: UNDEFINED
tracking URL: http://xxx:8888/proxy/application_1698291202798_0002/
user: user_excute
23/10/26 15:02:42 INFO Client: Application report for application_1698291202798_0002 (state: ACCEPTED)
23/10/26 15:02:43 INFO Client: Application report for application_1698291202798_0002 (state: ACCEPTED)
23/10/26 15:02:44 INFO Client: Application report for application_1698291202798_0002 (state: ACCEPTED)
23/10/26 15:02:45 INFO Client: Application report for application_1698291202798_0002 (state: ACCEPTED)
23/10/26 15:02:46 INFO Client: Application report for application_1698291202798_0002 (state: RUNNING)
23/10/26 15:02:46 INFO Client:
client token: N/A
diagnostics: N/A
ApplicationMaster host: xxx.xxx.xxx.xxx
ApplicationMaster RPC port: -1
queue: root.user_excute
start time: 1698303760575
final status: UNDEFINED
tracking URL: http://xxx:8888/proxy/application_1698291202798_0002/
user: user_excute
default
Time taken: 1.274 seconds, Fetched 1 row(s)
1
Time taken: 9.996 seconds, Fetched 1 row(s)
7. running an initialization script before entering interactive mode
./bin/spark-sql -i /path/to/spark-sql-init.sql
8. entering interactive mode
./bin/spark-sql
spark-sql> SELECT 1;
1
spark-sql> -- This is a simple comment.
spark-sql> SELECT 1;
1
9. entering interactive mode with escape ; in comment:
./bin/spark-sql
spark-sql>/* This is a comment contains \\;
> It won't be terminated by \\; */
> SELECT 1;
1
?
七. 非交互模式用法小結(jié)
命令語(yǔ)法:
spark-sql -f file_path
- -f:指定文件參數(shù)
- file_path:支持本地路徑,也支持hdfs路徑,例如:hdfs://[namenode]:[port]/path/to/spark-sql-cli.sql.
sql文件內(nèi)容的書(shū)寫(xiě)格式:
- 各個(gè)sql以;號(hào)分割;
- 注釋的語(yǔ)法:見(jiàn)上
?
?文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-715113.html
參考:
https://spark.apache.org/docs/latest/sql-distributed-sql-engine-spark-sql-cli.html文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-715113.html
到了這里,關(guān)于【spark客戶端】Spark SQL CLI詳解:怎么執(zhí)行sql文件、注釋怎么寫(xiě),支持的文件路徑協(xié)議、交互式模式使用細(xì)節(jié)的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!