賽題來(lái)源2023年全國(guó)職業(yè)院校技能大賽賽題第1套任務(wù)B中指標(biāo)計(jì)算模塊
子任務(wù)三:指標(biāo)計(jì)算
編寫Scala代碼,使用Spark計(jì)算相關(guān)指標(biāo)。
注:在指標(biāo)計(jì)算中,不考慮訂單信息表中order_status字段的值,將所有訂單視為有效訂單。計(jì)算訂單金額或訂單總金額時(shí)只使用final_total_amount字段。需注意dwd所有的維表取最新的分區(qū)。
?第一題
根據(jù)dwd層表統(tǒng)計(jì)每個(gè)省份、每個(gè)地區(qū)、每個(gè)月下單的數(shù)量和下單的總金額,存入MySQL數(shù)據(jù)庫(kù)shtd_result的provinceeverymonth表中(表結(jié)構(gòu)如下),然后在Linux的MySQL命令行中根據(jù)訂單總數(shù)、訂單總金額、省份表主鍵均為降序排序,查詢出前5條,將SQL語(yǔ)句復(fù)制粘貼至客戶端桌面【Release\任務(wù)B提交結(jié)果.docx】中對(duì)應(yīng)的任務(wù)序號(hào)下,將執(zhí)行結(jié)果截圖粘貼至客戶端桌面【Release\任務(wù)B提交結(jié)果.docx】中對(duì)應(yīng)的任務(wù)序號(hào)下
字段 |
類型 |
中文含義 |
備注 |
provinceid |
int |
省份表主鍵 |
|
provincename |
text |
省份名稱 |
|
regionid |
int |
地區(qū)表主鍵 |
|
regionname |
text |
地區(qū)名稱 |
|
totalconsumption |
double |
訂單總金額 |
當(dāng)月訂單總金額 |
totalorder |
int |
訂單總數(shù) |
當(dāng)月訂單總數(shù) |
year |
int |
年 |
訂單產(chǎn)生的年 |
month |
int |
月 |
訂單產(chǎn)生的月 |
import org.apache.spark.SparkConf
import org.apache.spark.sql.{SaveMode, SparkSession}
object Compute01 {
def main(args: Array[String]): Unit = {
System.setProperty("HADOOP_USER_NAME", "atguigu")
// TODO 創(chuàng)建spark連接
val conf = new SparkConf().setMaster("local[*]").setAppName("Compute01")
val spark: SparkSession = SparkSession.builder().config(conf).enableHiveSupport().getOrCreate()
// 開(kāi)啟動(dòng)態(tài)分區(qū)
spark.conf.set("hive.exec.dynamic.partition.mode", "nonstrict")
// 關(guān)閉打印日志
spark.sparkContext.setLogLevel("OFF")
// TODO 執(zhí)行核心查詢SQL
val result = spark.sql(
"""
|select
| province.id provinceid,
| province.name provincename,
| region.id regionid,
| region.region_name regionname,
| sum(final_total_amount)
| over(partition by province.id,region.id,year(od.create_time),month(od.create_time)) totalconsumption,
| count(od.id)
| over(partition by province.id,region.id,year(od.create_time),month(od.create_time)) totalorder,
| year(od.create_time) year,
| month(od.create_time) month
|from (
| select
| id,
| province_id,
| final_total_amount,
| trade_body,
| create_time
| from dwd.act_order_info
|) od
|left join (
| select
| id,
| name,
| region_id
| from ods.dim_province
| where etl_date = (
| select max(etl_date)
| from ods.dim_province
| )
|) province on od.province_id = province.id
|left join (
| select
| id,
| region_name
| from dwd.dim_region
| where etl_date = (
| select max(etl_date)
| from ods.dim_province
| )
|) region on province.region_id = region.id
|""".stripMargin)
// 查看結(jié)果
result.show()
// TODO 結(jié)果保存到指定的表中
result.write
.format("jdbc") // 使用jdbc格式寫入帶mysql
.mode(SaveMode.Append) // 保存方式為追加
.option("Driver", "com.mysql.jdbc.Driver")
.option("url", "jdbc:mysql://shtd_result")
.option("user", "root")
.option("password", "000000")
.option("dbtable", "provinceeverymonth")
.save()
// shtd_result.provinceeverymonth
// TODO 關(guān)閉spark連接
spark.close()
}
}
結(jié)果查詢SQL
-- 訂單總數(shù)
select
*
from dwd.shtd_result.provinceeverymonth
order by totalorder desc
limit 5;
-- 訂單總金額
select
*
from dwd.shtd_result.provinceeverymonth
order by totalconsumption desc
limit 5;
-- 省份表主鍵
select
*
from dwd.shtd_result.provinceeverymonth
order by provinceid desc
limit 5;
第二題
請(qǐng)根據(jù)dwd層表計(jì)算出2020年4月每個(gè)省份的平均訂單金額和所有省份平均訂單金額相比較結(jié)果(“高/低/相同”),存入MySQL數(shù)據(jù)庫(kù)shtd_result的provinceavgcmp表(表結(jié)構(gòu)如下)中,然后在Linux的MySQL命令行中根據(jù)省份表主鍵、該省平均訂單金額均為降序排序,查詢出前5條,將SQL語(yǔ)句復(fù)制粘貼至客戶端桌面【Release\任務(wù)B提交結(jié)果.docx】中對(duì)應(yīng)的任務(wù)序號(hào)下,將執(zhí)行結(jié)果截圖粘貼至客戶端桌面【Release\任務(wù)B提交結(jié)果.docx】中對(duì)應(yīng)的任務(wù)序號(hào)下
字段 |
類型 |
中文含義 |
備注 |
provinceid |
int |
省份表主鍵 |
|
provincename |
text |
省份名稱 |
|
provinceavgconsumption |
double |
該省平均訂單金額 |
|
allprovinceavgconsumption |
double |
所有省平均訂單金額 |
|
comparison |
text |
比較結(jié)果 |
該省平均訂單金額和所有省平均訂單金額比較結(jié)果,值為:高/低/相同 |
import org.apache.spark.SparkConf
import org.apache.spark.sql.{SaveMode, SparkSession}
object Compute02 {
def main(args: Array[String]): Unit = {
System.setProperty("HADOOP_USER_NAME", "atguigu")
val conf = new SparkConf().setMaster("local[*]").setAppName("Compute02")
val spark: SparkSession = SparkSession.builder().config(conf).enableHiveSupport().getOrCreate()
spark.conf.set("hive.exec.dynamic.partition.mode", "nonstrict")
spark.sparkContext.setLogLevel("OFF")
val result = spark.sql(
"""
|select
| provinceid,
| provincename,
| provinceavgconsumption,
| allprovinceavgconsumption,
| case
| when provinceavgconsumption > allprovinceavgconsumption then '高'
| when provinceavgconsumption < allprovinceavgconsumption then '低'
| else '相同'
| end comparison -- 比較結(jié)果
|from
|(
| select
| id provinceid,
| name provincename
| from dwd.dim_province
| where etl_date = (
| select max(etl_date) from ods.base_province
| )
|) province
|left join (
| select
| province_id,
| avg(final_total_amount) provinceavgconsumption -- 該省平均訂單金額
| from ods.order_info
| where create_time between '2020-04-01' and '2020-04-30'
| group by dwd.act_order_info
|) od on od.province_id = province.provinceid
|left join (
| select
| province_id,
| avg(final_total_amount) allprovinceavgconsumption -- 所有省平均訂單金額
| from dwd.act_order_info
| where create_time between '2020-06-01' and '2022-06-30'
|) avgorder on avgorder.province_id = province.provinceid
|""".stripMargin)
result
.write
.format("jdbc")
.mode(SaveMode.Append)
.option("Driver", "com.mysql.jdbc.Driver")
.option("url", "jdbc:mysql://shtd_result")
.option("user", "root")
.option("password", "000000")
.option("dbtable", "provinceavgcmp")
.save()
spark.close()
}
}
結(jié)果查詢SQL
-- 省份表主鍵
select *
from shtd_result.provinceavgcmp
order by provinceid desc
limit 5;
-- 該省平均訂單金額
select
*
from shtd_result.provinceavgcmp
order by provinceavgconsumption desc
limit 5;
第三題
根據(jù)dwd層表統(tǒng)計(jì)在兩天內(nèi)連續(xù)下單并且下單金額保持增長(zhǎng)的用戶,存入MySQL數(shù)據(jù)庫(kù)shtd_result的usercontinueorder表(表結(jié)構(gòu)如下)中,然后在Linux的MySQL命令行中根據(jù)訂單總數(shù)、訂單總金額、客戶主鍵均為降序排序,查詢出前5條,將SQL語(yǔ)句復(fù)制粘貼至客戶端桌面【Release\任務(wù)B提交結(jié)果.docx】中對(duì)應(yīng)的任務(wù)序號(hào)下,將執(zhí)行結(jié)果截圖粘貼至客戶端桌面【Release\任務(wù)B提交結(jié)果.docx】中對(duì)應(yīng)的任務(wù)序號(hào)下
字段 |
類型 |
中文含義 |
備注 |
userid |
int |
客戶主鍵 |
|
username |
text |
客戶名稱 |
|
day |
text |
日 |
記錄下單日的時(shí)間,格式為 yyyyMMdd_yyyyMMdd 例如: 20220101_20220102 |
totalconsumption |
double |
訂單總金額 |
連續(xù)兩天的訂單總金額 |
totalorder |
int |
訂單總數(shù) |
連續(xù)兩天的訂單總數(shù) |
import org.apache.spark.SparkConf
import org.apache.spark.sql.{SaveMode, SparkSession}
object Compute03 {
def main(args: Array[String]): Unit = {
System.setProperty("HADOOP_USER_NAME", "atguigu")
val conf = new SparkConf().setMaster("local[*]").setAppName("Compute03")
val spark: SparkSession = SparkSession.builder().config(conf).enableHiveSupport().getOrCreate()
spark.conf.set("hive.exec.dynamic.partition.mode", "nonstrict")
spark.sparkContext.setLogLevel("OFF")
val result = spark.sql(
"""
| select
| userid,
| username,
| buy_date_first,
| buy_date_second,
| concat(buy_date_first, '_', buy_date_second) day,
| totalconsumption,
| totalorder
| from (
| select
| od1.user_id userid,
| od2.consignee username,
| buy_date_first,
| buy_date_second,
| totalconsumption,
| od1.totalorder,
| datediff(buy_date_second, buy_date_first) part_date_num,
| if (buy_amount_second - total_amount > 0, 1, 0) part_amount_increase
| from (
| select
| user_id ,
| create_time buy_date_first, -- 獲取當(dāng)前時(shí)間的下一天
| count(id) totalorder,
| lead(create_time, 1, "9999-12-31 00:00:00") over (partition by user_id order by create_time) buy_date_second,
| lead(final_total_amount) over(partition by user_id order by create_time) buy_amount_second,
| sum(total_amount) over (partition by user_id) totalconsumption
| from dwd.act_order_info
| group by user_id, date_format(create_time, 'yyyyMMdd')
| ) od1
| left join (
| select
| user_id,
| consignee,
| final_total_amount
| from dwd.act_order_info
| ) od2 on od1.user_id = od2.user_id
| )
| where part_date_num = 1 -- 連續(xù)兩天的訂單
| and part_amount_increase = 1 -- 訂單金額保持增長(zhǎng)
|""".stripMargin)
result
.write
.format("jdbc")
.mode(SaveMode.Append)
.option("Driver", "com.mysql.jdbc.Driver")
.option("url", "jdbc:mysql://shtd_result")
.option("user", "root")
.option("password", "000000")
.option("dbtable", "provinceavgcmp")
.save()
spark.close()
}
}
結(jié)果查詢sql文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-791431.html
-- 訂單總數(shù)
select
*
from shtd_result.usercontinueorder
order by totalorder desc
limit 5;
-- 訂單總金額
select
*
from shtd_result.usercontinueorder
order by totalconsumption
limit 5;
-- 客戶主鍵
select
*
from shtd_result.usercontinueorder
order by userid desc
limit 5;
指標(biāo)計(jì)算部分的難點(diǎn)就是多表查詢的部分已經(jīng)開(kāi)窗函數(shù)的合理運(yùn)用,因此熟練掌握HiveSQL中高級(jí)函數(shù)的部分是非常重要的,不然此部分將會(huì)很難完成文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-791431.html
到了這里,關(guān)于全國(guó)職業(yè)院校技能大賽-大數(shù)據(jù) 離線數(shù)據(jù)處理模塊-指標(biāo)計(jì)算的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!