spark-sql: insert overwrite分區(qū)表問題

2年前作者：L13763338360分類：Toy博客閱讀(21)違法舉報(bào)

這篇具有很好參考價(jià)值的文章主要介紹了spark-sql: insert overwrite分區(qū)表問題。希望對(duì)大家有所幫助。如果存在錯(cuò)誤或未考慮完全的地方，請(qǐng)大家不吝賜教，您也可以點(diǎn)擊"舉報(bào)違法"按鈕提交疑問。

1. 問題背景

用spark-sql，insert overwrite分區(qū)表時(shí)發(fā)現(xiàn)兩個(gè)比較麻煩的問題：

從目標(biāo)表select出來再insert overwrite目標(biāo)表時(shí)報(bào)錯(cuò)：Error in query: Cannot overwrite a path that is also being read from.
從其他表select出來再insert overwrite目標(biāo)表時(shí)，其他分區(qū)都被刪除了.

2. 問題描述

2.1 代碼示例

drop table pt_table_test1;
create table pt_table_test1 (
    id int,
    region string,
    dt string
) using parquet
partitioned by (region, dt)
;


drop table pt_table_test2;
create table pt_table_test2 (
    id int,
    region string,
    dt string
) using parquet
partitioned by (region, dt)
;

set hive.exec.dynamic.partition =true;
set hive.exec.dynamic.partition.mode = nonstrict;

truncate table pt_table_test1;
insert into table pt_table_test1 values (1,'id', '2022-10-01'),(2,'id', '2022-10-02'),(3,'ph', '2022-10-03'),(1,'sg', '2022-10-01'),(2,'sg', '2022-10-02'),(3,'ph', '2022-10-03');
select * from pt_table_test1;


insert overwrite table pt_table_test1 select * from pt_table_test1 where dt = '2022-10-01';
select * from pt_table_test1;

truncate table pt_table_test2;
insert into table pt_table_test2 values (2,'id', '2022-10-01'),(2,'id', '2022-10-02'),(2,'sg', '2022-10-01'),(2,'sg', '2022-10-02');
insert overwrite table pt_table_test1 select * from pt_table_test2 where id = 2;
select * from pt_table_test1;

2.2 錯(cuò)誤演示

spark-sql: insert overwrite分區(qū)表問題

3. 解決方法

印象中這兩個(gè)問題也出現(xiàn)過，但憑經(jīng)驗(yàn)和感覺，應(yīng)該可以解決。找到以前正常運(yùn)行的表，對(duì)比分析了下，發(fā)現(xiàn)是建表方式不一致問題:

錯(cuò)誤建表，指定表的文件格式：using parquet
正確姿勢(shì)，指定表的文件格式：stored as parquet

3.1 示例代碼

drop table pt_table_test1;
create table pt_table_test1 (
    id int,
    region string,
    dt string
) stored as parquet
partitioned by (region, dt)
;


drop table pt_table_test2;
create table pt_table_test2 (
    id int,
    region string,
    dt string
) stored as parquet
partitioned by (region, dt)
;

set hive.exec.dynamic.partition =true;
set hive.exec.dynamic.partition.mode = nonstrict;

truncate table pt_table_test1;
insert into table pt_table_test1 values (1,'id', '2022-10-01'),(1,'id', '2022-10-02'),(1,'ph', '2022-10-03'),(1,'sg', '2022-10-01'),(1,'sg', '2022-10-02'),(1,'ph', '2022-10-03');
select * from pt_table_test1;


insert overwrite table pt_table_test1 select * from pt_table_test1 where dt = '2022-10-01';
select * from pt_table_test1;

truncate table pt_table_test2;
insert into table pt_table_test2 values (2,'id', '2022-10-01'),(2,'id', '2022-10-02'),(2,'sg', '2022-10-01'),(2,'sg', '2022-10-02');
insert overwrite table pt_table_test1 select * from pt_table_test2 where id = 2;
select * from pt_table_test1;

3.2 正確演示?

spark-sql: insert overwrite分區(qū)表問題

4. using parqnet和stored as parquet

對(duì)比兩種建表：

spark-sql: insert overwrite分區(qū)表問題文章來源地址http://www.zghlxwxcb.cn/news/detail-510028.html

建表無論是using parquet還是stored as parquet，執(zhí)行show create table都顯示: USING parquet。
stored as parquet時(shí)，執(zhí)行show create table，新增了TBLPROPERTIES屬性。

到了這里，關(guān)于spark-sql: insert overwrite分區(qū)表問題的文章就介紹完了。如果您還想了解更多內(nèi)容，請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！

本文來自互聯(lián)網(wǎng)用戶投稿，該文觀點(diǎn)僅代表作者本人，不代表本站立場(chǎng)。本站僅提供信息存儲(chǔ)空間服務(wù)，不擁有所有權(quán)，不承擔(dān)相關(guān)法律責(zé)任。如若轉(zhuǎn)載，請(qǐng)注明出處：如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實(shí)不符，請(qǐng)點(diǎn)擊違法舉報(bào)進(jìn)行投訴反饋，一經(jīng)查實(shí)，立即刪除！

分享到：

領(lǐng)支付寶紅包贊助服務(wù)器費(fèi)用

Hive 分區(qū)表 (Partitioned Tables) 『創(chuàng)建分區(qū)表 | CRUD分區(qū) | 修復(fù)分區(qū) | 數(shù)據(jù)導(dǎo)入(靜態(tài)分區(qū)、動(dòng)態(tài)分區(qū)) | 查詢數(shù)據(jù)/表結(jié)構(gòu)』
條件：假如現(xiàn)有一個(gè)角色表 t_all_hero ，該表中有6個(gè)清洗干凈的互不干擾的數(shù)據(jù)文件：射手、坦克、戰(zhàn)士、法師、刺客、輔助要求：查找出名字為射手且生命值大于6000的角色人數(shù) 慣性解決方法：按照MySQL思維很容易想到問：如何提高效率？這樣雖然能夠解決問題，但是由于要
2024年02月04日
瀏覽(39)
mysql分區(qū)表：日期分區(qū)
1.創(chuàng)建分區(qū)表 2.查看分區(qū) 3.添加分區(qū) 4.存儲(chǔ)過程：分區(qū)刪除與創(chuàng)建 5.事件定時(shí) 6.觸發(fā)器設(shè)計(jì)：子表每插入一行，總表獲得一行 7.創(chuàng)建索引 8.添加枚舉型字段
2024年02月16日
瀏覽(17)
hive分區(qū)表靜態(tài)分區(qū)和動(dòng)態(tài)分區(qū)
現(xiàn)有數(shù)據(jù)文件 data_file 如下： 2023-08-01,Product A,100.0 2023-08-05,Product B,150.0 2023-08-10,Product A,200.0 需要手動(dòng)指定分區(qū) 現(xiàn)有源數(shù)據(jù)表如下： CREATE TABLE sales_source ( ? ? sale_date STRING, ? ? product STRING, ? ? amount DOUBLE ); INSERT INTO sales_source VALUES ? ? (\\\'2023-08-01\\\', \\\'Product A\\\', 100.0), ? ? (\\\'2023-08-
2024年02月10日
瀏覽(28)
Hive分區(qū)表實(shí)戰(zhàn) - 多分區(qū)字段
本實(shí)戰(zhàn)教程通過一系列Hive SQL操作，演示了如何在大數(shù)據(jù)環(huán)境下創(chuàng)建具有省市分區(qū)的大學(xué)表，并從本地文件系統(tǒng)加載不同地區(qū)的學(xué)校數(shù)據(jù)到對(duì)應(yīng)分區(qū)。首先，創(chuàng)建名為 school 的數(shù)據(jù)庫并切換至該數(shù)據(jù)庫；接著，在數(shù)據(jù)庫中定義一個(gè)名為 university 的分區(qū)表，其結(jié)構(gòu)包括ID和名稱兩
2024年01月15日
瀏覽(24)
mysql分區(qū)表 -列表分區(qū)(list prtition)
示例，創(chuàng)建一張員工表按照employee_id進(jìn)行列表分區(qū)：查詢0號(hào)分區(qū)： select * from employees partition(p0); select * from employees partition(p1); select * from employees partition(p0,p1); 和range分區(qū)一樣，可以使用alter table … add/drop partition新增/刪除分區(qū)： ALTER TABLE employees ADD PARTITION(PARTITION p2 VALUES IN
2024年02月16日
瀏覽(27)
HIVE創(chuàng)建分區(qū)表
partitioned by ( c2 string ) # 創(chuàng)建分區(qū) c1跟c2都是字段，但是創(chuàng)建的時(shí)候不能寫在t2里面，只能寫在分區(qū)里面（同時(shí)select查詢的時(shí)候，c2的字段也要寫在最后面）要加載數(shù)據(jù)到分區(qū)表，只需在原來的加載數(shù)據(jù)的語句上增加partition，同時(shí)指定分區(qū)的字段值即可。注意：當(dāng)你退出
2024年02月15日
瀏覽(22)
用好 mysql 分區(qū)表
為了保證MySQL的性能，我們都建議mysql單表不要太大，也經(jīng)常有人問我這樣的問題，整體來說呢，建議是：單表小于2G，記錄數(shù)小于1千萬，十庫百表。如果但行記錄數(shù)非常小，那么記錄數(shù)可以再偏大些，反之，可能記錄數(shù)到百萬級(jí)別就開始變慢了。那么，業(yè)務(wù)量在增長，數(shù)據(jù)
2024年02月08日
瀏覽(23)
MySQL 分區(qū)表設(shè)計(jì)
1、分區(qū)表設(shè)計(jì)方案當(dāng)設(shè)計(jì) MySQL 分區(qū)表時(shí)，需要考慮以下幾個(gè)方面：分區(qū)策略、分區(qū)字段、分區(qū)數(shù)量和分區(qū)函數(shù)。下面是一個(gè)詳細(xì)的示例，展示了如何設(shè)計(jì)和執(zhí)行分區(qū)表的增刪改查操作。設(shè)計(jì)分區(qū)表：考慮一個(gè)訂單表的例子，我們可以按照訂單創(chuàng)建時(shí)間對(duì)表進(jìn)行范圍分區(qū)。
2024年02月07日
瀏覽(23)
oracle分區(qū)表創(chuàng)建（自動(dòng)按年、月、日分區(qū)）實(shí)戰(zhàn)
前言：工作中有一張表一年會(huì)增長100多萬的數(shù)據(jù)，量雖然不大，可是表字段多，所以一年下來也會(huì)達(dá)到 1G，而且只增不改，故考慮使用分區(qū)表來提高查詢性能，提高維護(hù)性。 oracle 11g 支持自動(dòng)分區(qū)，不過得在創(chuàng)建表時(shí)就設(shè)置好分區(qū)。如果已經(jīng)存在的表需要改分區(qū)表，就需要將
2024年02月02日
瀏覽(28)
HiveSQL分區(qū)的作用及創(chuàng)建分區(qū)表案例演示(圖解)
目錄一、分區(qū)的作用二、單級(jí)分區(qū)表 1.準(zhǔn)備工作 2.創(chuàng)建數(shù)據(jù)表 3.查詢數(shù)據(jù) 4.創(chuàng)建分區(qū)數(shù)據(jù)表 5.添加數(shù)據(jù) 5.1添加方式1:靜態(tài)分區(qū)(需要指定分區(qū)字段和值) 5.2添加方式2:動(dòng)態(tài)分區(qū)(只需指定分區(qū)字段,分區(qū)字段相同的數(shù)據(jù)自動(dòng)分配到同一個(gè)區(qū)) 三、多級(jí)分區(qū)表 1.準(zhǔn)備工作 2.創(chuàng)建分區(qū)
2024年01月17日
瀏覽(21)