使用like操作可以進(jìn)行字符串比較,全文檢索等,性能相對(duì)比較差,有些情況下可以通過建立索引來(lái)提升性能。下面我們通過使用TPCH orders表作為例子,來(lái)進(jìn)行說明。但是請(qǐng)注意, not like是不能用任何索引的,BTREE不支持!=操作,只能進(jìn)行=和范圍查找。
TPCH orders表的定義如下,其中o_comment列是varchar類型的字符串:
tpch10=# \d orders
Table "public.orders"
Column | Type | Collation | Nullable | Default
-----------------+-----------------------+-----------+----------+---------
o_orderkey | integer | | not null |
o_custkey | integer | | not null |
o_orderstatus | character(1) | | not null |
o_totalprice | numeric(15,2) | | not null |
o_orderdate | date | | not null |
o_orderpriority | character(15) | | not null |
o_clerk | character(15) | | not null |
o_shippriority | integer | | not null |
o_comment | character varying(79) | | not null |
Indexes:
"orders_pkey" PRIMARY KEY, btree (o_orderkey)
Foreign-key constraints:
"orders_o_custkey_fkey" FOREIGN KEY (o_custkey) REFERENCES customer(c_custkey)
Referenced by:
TABLE "lineitem" CONSTRAINT "lineitem_l_orderkey_fkey" FOREIGN KEY (l_orderkey) REFERENCES orders(o_orderkey)
1.?前匹配 LIKE ‘prefix%’范圍查詢
前匹配可以使用BTREE索引進(jìn)行范圍查詢,在沒有任何索引的情況下,PG默認(rèn)使用seqscan順序掃描:
tpch10=# explain analyze select * from orders where o_comment like 'request%';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Seq Scan on orders (cost=0.00..448412.00 rows=75000 width=234) (actual time=0.662..5098.904 rows=56870 loops=1)
Filter: ((o_comment)::text ~~ 'request%'::text)
Rows Removed by Filter: 14943130
Planning Time: 0.268 ms
Execution Time: 5103.280 ms
(5 rows)
我們?cè)趏_comment列加一下BTREE索引,然后再次嘗試:
tpch10=# create index on orders(o_comment);
CREATE INDEX
tpch10=# analyze orders;
ANALYZE
tpch10=# explain analyze select * from orders where o_comment like 'request%';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------
Seq Scan on orders (cost=0.00..448412.00 rows=75000 width=234) (actual time=0.662..5098.904 rows=56870 loops=1)
Filter: ((o_comment)::text ~~ 'request%'::text)
Rows Removed by Filter: 14943130
Planning Time: 0.268 ms
Execution Time: 5103.280 ms
(5 rows)
tpch10=# explain analyze select * from orders where o_comment like 'request';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------
Index Scan using orders_o_comment_idx on orders (cost=0.56..8.58 rows=1 width=107) (actual time=0.188..0.188 rows=0 loops=1)
Index Cond: ((o_comment)::text = 'request'::text)
Filter: ((o_comment)::text ~~ 'request'::text)
Planning Time: 0.408 ms
Execution Time: 0.250 ms
(5 rows)
發(fā)現(xiàn)PG在like ‘prefix%’并沒有使用BTREE索引,而精確匹配可以使用索引,問題在于默認(rèn)的collate是en_US.UTF-8,而不是C,將o_comment列改為C collate再進(jìn)行嘗試。使用alter table命令將o_comment列改成collate C,可以查看到o_comment列的Collation屬性是C。
tpch10=# alter table orders alter column o_comment TYPE VARCHAR(79) collate "C";
ALTER TABLE
tpch10=# \d orders
Table "public.orders"
Column | Type | Collation | Nullable | Default
-----------------+-----------------------+-----------+----------+---------
o_orderkey | integer | | not null |
o_custkey | integer | | not null |
o_orderstatus | character(1) | | not null |
o_totalprice | numeric(15,2) | | not null |
o_orderdate | date | | not null |
o_orderpriority | character(15) | | not null |
o_clerk | character(15) | | not null |
o_shippriority | integer | | not null |
o_comment | character varying(79) | C | not null |
Indexes:
"orders_pkey" PRIMARY KEY, btree (o_orderkey)
"orders_o_comment_idx" btree (o_comment)
Foreign-key constraints:
"orders_o_custkey_fkey" FOREIGN KEY (o_custkey) REFERENCES customer(c_custkey)
Referenced by:
TABLE "lineitem" CONSTRAINT "lineitem_l_orderkey_fkey" FOREIGN KEY (l_orderkey) REFERENCES orders(o_orderkey)
tpch10=#
再執(zhí)行一次之前的語(yǔ)句,發(fā)現(xiàn)使用了創(chuàng)建的BTREE索引,時(shí)間從5103ms提升到232ms,可以看到性能得到極大提升。
tpch10=# explain analyze select * from orders where o_comment like 'request%';
QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on orders (cost=3477.31..168092.77 rows=75000 width=234) (actual time=50.869..228.043 rows=56870 loops=1)
Filter: ((o_comment)::text ~~ 'request%'::text)
Heap Blocks: exact=51160
-> Bitmap Index Scan on orders_o_comment_idx (cost=0.00..3458.56 rows=75000 width=0) (actual time=33.213..33.213 rows=56870 loops=1)
Index Cond: (((o_comment)::text >= 'request'::text) AND ((o_comment)::text < 'requesu'::text))
Planning Time: 0.528 ms
Execution Time: 232.989 ms
(7 rows)
tpch10=#
2. 后匹配 LIKE ‘%prefix’范圍查詢
后匹配的方式不能使用BTREE索引,原因在于BTREE索引只能做大于、大于等于、等于、小于、小于等于等操作決定的。因此這時(shí)候使用reverse index來(lái)處理后匹配的文本檢索。
比如我們使用相同的例子,只是like匹配的是‘%request’,PG依舊使用順序掃描,而沒有使用之前建的索引。
tpch10=# explain analyze select * from orders where o_comment like '%request';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------
Seq Scan on orders (cost=0.00..448412.00 rows=151515 width=107) (actual time=0.043..8640.280 rows=57305 loops=1)
Filter: ((o_comment)::text ~~ '%request'::text)
Rows Removed by Filter: 14942695
Planning Time: 1.065 ms
Execution Time: 8645.223 ms
(5 rows)
我們給o_comment列建一個(gè)reverse的索引,會(huì)看到執(zhí)行語(yǔ)句會(huì)選用reverse索引,時(shí)間由8645ms降到162ms,提升效率非常高。因此在這種情況下可以考慮使用reverse索引來(lái)提高后匹配的檢索效率。
tpch10=# create index on orders(reverse(o_comment));
CREATE INDEX
tpch10=# analyze orders;
ANALYZE
tpch10=# explain analyze select * from orders where reverse(o_comment) like reverse('%request');
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------
Index Scan using orders_reverse_idx on orders (cost=0.56..8.59 rows=1500 width=107) (actual time=0.142..157.412 rows=57305 loops=1)
Index Cond: ((reverse((o_comment)::text) >= 'tseuqer'::text) AND (reverse((o_comment)::text) < 'tseuqes'::text))
Filter: (reverse((o_comment)::text) ~~ 'tseuqer%'::text)
Planning Time: 0.620 ms
Execution Time: 162.048 ms
(5 rows)
tpch10=#
3. 中間匹配'%prefix%'范圍查詢
對(duì)于中間匹配的全文檢索,以上兩種索引都不能解決,PG提供了GIN索引,可以使用PG提供的pg_trgm插件來(lái)加速查詢。GIN索引可以應(yīng)用于前匹配、后匹配和中間匹配,但是因?yàn)镚IN本身比較昂貴,對(duì)于前匹配和后匹配,可以使用上面提到的方式,中間匹配采用GIN索引,但是也需要考慮索引維護(hù)本身的代價(jià)。
在沒有使用GIN索引時(shí)候,PG依舊使用順序掃描:
tpch10=# explain analyze select * from orders where o_comment like '%request%';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------
Seq Scan on orders (cost=0.00..448442.18 rows=2273093 width=107) (actual time=0.108..8773.358 rows=2318296 loops=1)
Filter: ((o_comment)::text ~~ '%request%'::text)
Rows Removed by Filter: 12681704
Planning Time: 0.836 ms
Execution Time: 8901.078 ms
(5 rows)
tpch10=# explain analyze select * from orders where reverse(o_comment) like reverse('%request%');
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------
Seq Scan on orders (cost=0.00..485948.21 rows=3030791 width=107) (actual time=0.069..18095.022 rows=2318296 loops=1)
Filter: (reverse((o_comment)::text) ~~ '%tseuqer%'::text)
Rows Removed by Filter: 12681704
Planning Time: 4.783 ms
Execution Time: 18218.893 ms
(5 rows)
但是GIN索引的創(chuàng)建和維護(hù)非常昂貴,因?yàn)镚IN索引對(duì)每組三個(gè)連續(xù)字符(三元組)進(jìn)行索引,以便能夠查找任何匹配模式。而且GIN索引也可以應(yīng)用于前后匹配的組合使用,這種情況下效率也很高。文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-755320.html
加載pg_trgm插件后,在o_comment列創(chuàng)建GIN索引,重新進(jìn)行查詢,時(shí)間由8901ms提升到3939ms,效率得到了提升,同時(shí)我們也可以看到GIN索引也可以進(jìn)行‘%prefix%prefix%’的查詢,效率也是很明顯的提升。文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-755320.html
tpch10=# create extension if not exists pg_trgm;
CREATE EXTENSION
tpch10=# create index on orders using gin (o_comment gin_trgm_ops);
CREATE INDEX
tpch10=# analyze orders;
ANALYZE
tpch10=# explain analyze select * from orders where o_comment like '%request%';
QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on orders (cost=29361.97..330044.89 rows=3181674 width=107) (actual time=1727.012..3828.371 rows=2318296 loops=1)
Recheck Cond: ((o_comment)::text ~~ '%request%'::text)
Rows Removed by Index Recheck: 2088
Heap Blocks: exact=260892
-> Bitmap Index Scan on orders_o_comment_idx1 (cost=0.00..28566.55 rows=3181674 width=0) (actual time=1604.355..1604.355 rows=2320384 loops=1)
Index Cond: ((o_comment)::text ~~ '%request%'::text)
Planning Time: 0.473 ms
Execution Time: 3939.557 ms
(8 rows)
tpch10=# explain analyze select * from orders where o_comment like '%special%request%';
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on orders (cost=1478.19..236736.24 rows=151508 width=107) (actual time=1794.711..2266.371 rows=168338 loops=1)
Recheck Cond: ((o_comment)::text ~~ '%special%request%'::text)
Rows Removed by Index Recheck: 80603
Heap Blocks: exact=161167
-> Bitmap Index Scan on orders_o_comment_idx1 (cost=0.00..1440.31 rows=151508 width=0) (actual time=1731.915..1731.915 rows=248941 loops=1)
Index Cond: ((o_comment)::text ~~ '%special%request%'::text)
Planning Time: 0.501 ms
Execution Time: 2278.128 ms
(8 rows)
到了這里,關(guān)于POSTGRESQL(PG) 性能優(yōu)化之like全文檢索優(yōu)化的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!