數(shù)據(jù)集直鏈下載(100MB): https://www.123pan.com/s/T1n0Vv-mTc3d
1.0數(shù)據(jù)說明
2.0建表
1.創(chuàng)建數(shù)據(jù)庫
create database db_exercise;
2.切換到數(shù)據(jù)庫
use db_exercise;
3.創(chuàng)建原始數(shù)據(jù)表:gulivideo,gulivideo_user
create table gulivideo_ori(
videoid string,
uploader string,
age int,
category array<string>,
length int,
views int,
rate float,
ratings int,
comments int,
relatedId array<string>)
row format delimited fields terminated by "\t"
collection items terminated by "&";
create table gulivideo_user(
uploader string,
videos int,
friends int)
row format delimited
fields terminated by "\t";
4.上傳數(shù)據(jù)到hdfs
hdfs dfs -put data/user/user.txt /user/hive/warehouse/db_exercise.db/gulivideo_ori
hdfs dfs -put data/video/* /user/hive/warehouse/db_exercise.db/gulivideo_ori
3.0實戰(zhàn)練習(xí)
3.1統(tǒng)計視頻觀看數(shù) Top10
select videoid,views
from gulivideo_ori
order by views desc
limit 10;
3.2統(tǒng)計視頻類別熱度 Top10
(1)即統(tǒng)計每個類別有多少個視頻,顯示出包含視頻最多的前 10 個類別。
(2)我們需要按照類別 group by 聚合,然后 count 組內(nèi)的 videoId 個數(shù)即可。
(3)因為當(dāng)前表結(jié)構(gòu)為:一個視頻對應(yīng)一個或多個類別。所以如果要 group by 類別,需要先將類別進行列轉(zhuǎn)行(展開),然后再進行 count 即可。
(4)最后按照熱度排序,顯示前 10 條。
select t1.category_name,count(t1.videoId) hot
from(
select videoid,category_name
from gulivideo_ori lateral view explode(category) gulivideo_orc_tmp AS category_name
)t1
group by t1.category_name
order by hot desc
limit 10;
3.3統(tǒng)計出視頻觀看數(shù)最高的 20 個視頻的所屬類別以及類別包含Top20 視頻的個數(shù)
(1)先找到觀看數(shù)最高的 20 個視頻所屬條目的所有信息,降序排列
(2)把這 20 條信息中的 category 分裂出來(列轉(zhuǎn)行)
(3)最后查詢視頻分類名稱和該分類下有多少個 Top20 的視頻
select
t2.category_name,
COUNT(t2.videoid) video_sum
from(
select
t1.videoid,
category_name
from (
select videoid,views,category
from gulivideo_ori
order by views desc
limit 20
)t1
lateral VIEW explode(t1.category) t1_tmp AS category_name
)t2
GROUP BY t2.category_name;
3.4統(tǒng)計每個類別中的視頻熱度 Top10,以 Music 為例
(1)要想統(tǒng)計 Music 類別中的視頻熱度 Top10,需要先找到 Music 類別,那么就需要將category 展開,所以可以創(chuàng)建一張表用于存放 categoryId 展開的數(shù)據(jù)。
(2)向 category 展開的表中插入數(shù)據(jù)。
(3)統(tǒng)計對應(yīng)類別(Music)中的視頻熱度。文章來源:http://www.zghlxwxcb.cn/news/detail-605216.html
SELECT
t1.videoid,
t1.views,
t1.category_name
FROM
(
SELECT
videoId,
views,
category_name
FROM gulivideo_ori
lateral VIEW explode(category) gulivideo_orc_tmp AS category_name
)t1
WHERE
t1.category_name = "Music"
ORDER BY
t1.views
DESC
LIMIT 10;
3.5統(tǒng)計每個類別視頻觀看數(shù) Top10
SELECT
t2.videoid,
t2.views,
t2.category_name,
t2.rk
FROM
(
SELECT
t1.videoid,
t1.views,
t1.category_name,
rank() over(PARTITION BY t1.category_name ORDER BY t1.views DESC ) rk
FROM
(
SELECT
videoid,
views,
category_name
FROM gulivideo_ori
lateral VIEW explode(category) gulivideo_orc_tmp AS category_name
)t1
)t2
WHERE t2.rk <=10;
3.6統(tǒng)計上傳視頻最多的用戶 Top10以及他們上傳的視頻觀看次數(shù)在前 20 的視頻
SELECT
t2.videoid,
t2.views,
t2.uploader
FROM
(
SELECT
uploader,
videos
FROM gulivideo_user
ORDER BY videos DESC
LIMIT 10
) t1
JOIN gulivideo_ori t2
ON t1.uploader = t2.uploader
ORDER BY t2.views DESC
LIMIT 20;
文章來源地址http://www.zghlxwxcb.cn/news/detail-605216.html
到了這里,關(guān)于Hive實戰(zhàn)練習(xí)(包含數(shù)據(jù)集)的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!