1. 問題背景
用spark-sql,insert overwrite分區(qū)表時(shí)發(fā)現(xiàn)兩個(gè)比較麻煩的問題:
- 從目標(biāo)表select出來再insert overwrite目標(biāo)表時(shí)報(bào)錯(cuò):Error in query: Cannot overwrite a path that is also being read from.
- 從其他表select出來再insert overwrite目標(biāo)表時(shí),其他分區(qū)都被刪除了.
2. 問題描述
2.1 代碼示例
drop table pt_table_test1;
create table pt_table_test1 (
id int,
region string,
dt string
) using parquet
partitioned by (region, dt)
;
drop table pt_table_test2;
create table pt_table_test2 (
id int,
region string,
dt string
) using parquet
partitioned by (region, dt)
;
set hive.exec.dynamic.partition =true;
set hive.exec.dynamic.partition.mode = nonstrict;
truncate table pt_table_test1;
insert into table pt_table_test1 values (1,'id', '2022-10-01'),(2,'id', '2022-10-02'),(3,'ph', '2022-10-03'),(1,'sg', '2022-10-01'),(2,'sg', '2022-10-02'),(3,'ph', '2022-10-03');
select * from pt_table_test1;
insert overwrite table pt_table_test1 select * from pt_table_test1 where dt = '2022-10-01';
select * from pt_table_test1;
truncate table pt_table_test2;
insert into table pt_table_test2 values (2,'id', '2022-10-01'),(2,'id', '2022-10-02'),(2,'sg', '2022-10-01'),(2,'sg', '2022-10-02');
insert overwrite table pt_table_test1 select * from pt_table_test2 where id = 2;
select * from pt_table_test1;
2.2 錯(cuò)誤演示
3. 解決方法
印象中這兩個(gè)問題也出現(xiàn)過,但憑經(jīng)驗(yàn)和感覺,應(yīng)該可以解決。找到以前正常運(yùn)行的表,對(duì)比分析了下,發(fā)現(xiàn)是建表方式不一致問題:
- 錯(cuò)誤建表,指定表的文件格式:using parquet
- 正確姿勢(shì),指定表的文件格式:stored as parquet
3.1 示例代碼
drop table pt_table_test1;
create table pt_table_test1 (
id int,
region string,
dt string
) stored as parquet
partitioned by (region, dt)
;
drop table pt_table_test2;
create table pt_table_test2 (
id int,
region string,
dt string
) stored as parquet
partitioned by (region, dt)
;
set hive.exec.dynamic.partition =true;
set hive.exec.dynamic.partition.mode = nonstrict;
truncate table pt_table_test1;
insert into table pt_table_test1 values (1,'id', '2022-10-01'),(1,'id', '2022-10-02'),(1,'ph', '2022-10-03'),(1,'sg', '2022-10-01'),(1,'sg', '2022-10-02'),(1,'ph', '2022-10-03');
select * from pt_table_test1;
insert overwrite table pt_table_test1 select * from pt_table_test1 where dt = '2022-10-01';
select * from pt_table_test1;
truncate table pt_table_test2;
insert into table pt_table_test2 values (2,'id', '2022-10-01'),(2,'id', '2022-10-02'),(2,'sg', '2022-10-01'),(2,'sg', '2022-10-02');
insert overwrite table pt_table_test1 select * from pt_table_test2 where id = 2;
select * from pt_table_test1;
3.2 正確演示?
4. using parqnet和stored as parquet
對(duì)比兩種建表:文章來源:http://www.zghlxwxcb.cn/news/detail-510028.html
文章來源地址http://www.zghlxwxcb.cn/news/detail-510028.html
- 建表無論是using parquet還是stored as parquet,執(zhí)行show create table都顯示: USING parquet。
- stored as parquet時(shí),執(zhí)行show create table,新增了TBLPROPERTIES屬性。
到了這里,關(guān)于spark-sql: insert overwrite分區(qū)表問題的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!