B-Tree索引代碼流程分析
?專欄內(nèi)容:
- postgresql內(nèi)核源碼分析
- 手寫數(shù)據(jù)庫toadb
- 并發(fā)編程
?開源貢獻(xiàn):
- toadb開源庫
個人主頁:我的主頁
管理社區(qū):開源數(shù)據(jù)庫
座右銘:天行健,君子以自強(qiáng)不息;地勢坤,君子以厚德載物.
概述
在postgresql最常用的索引就是btree,它支持范圍和等值查詢。
本文主要介紹btree的代碼的入口,接口定義,主要涉及索引的查詢,插入,刪除,和數(shù)據(jù)的清理操作。
前言
索引是為了更快的找到實際數(shù)據(jù)表中的數(shù)據(jù),那么索引鍵值就非常小,可以一次性從磁盤讀取大量的索引數(shù)據(jù)。
但是有些索引值中存儲了實際數(shù)據(jù),與數(shù)據(jù)是一一對應(yīng)的,就是密集型索引,而有一些索引并不存儲實際數(shù)據(jù),而是存儲范圍內(nèi)的最大最小值,此類型索引叫做稀疏索引;對于密集型索引,如主鍵,直接可以得到對應(yīng)的數(shù)據(jù)位置或?qū)?yīng)列的數(shù)據(jù),btree算法就可以支持此類型的索引;
而稀疏索引,查到索引后,需要再遍歷數(shù)據(jù)表,或者二級索引才能命中目標(biāo)數(shù)據(jù)。
代碼入口
postgresql中為了代碼的解耦,定義了索引操作的結(jié)構(gòu)體,基成員是一組統(tǒng)一的操作和標(biāo)識選項;
對于btree的定義如下,可以在這里找到btree索引的操作接口名稱,在實際實用的只是調(diào)用結(jié)構(gòu)體的成員,也就是函數(shù)指針。
/*
* Btree handler function: return IndexAmRoutine with access method parameters
* and callbacks.
*/
Datum
bthandler(PG_FUNCTION_ARGS)
{
IndexAmRoutine *amroutine = makeNode(IndexAmRoutine);
amroutine->amstrategies = BTMaxStrategyNumber;
amroutine->amsupport = BTNProcs;
amroutine->amoptsprocnum = BTOPTIONS_PROC;
amroutine->amcanorder = true;
amroutine->amcanorderbyop = false;
amroutine->amcanbackward = true;
amroutine->amcanunique = true;
amroutine->amcanmulticol = true;
amroutine->amoptionalkey = true;
amroutine->amsearcharray = true;
amroutine->amsearchnulls = true;
amroutine->amstorage = false;
amroutine->amclusterable = true;
amroutine->ampredlocks = true;
amroutine->amcanparallel = true;
amroutine->amcaninclude = true;
amroutine->amusemaintenanceworkmem = false;
amroutine->amsummarizing = false;
amroutine->amparallelvacuumoptions =
VACUUM_OPTION_PARALLEL_BULKDEL | VACUUM_OPTION_PARALLEL_COND_CLEANUP;
amroutine->amkeytype = InvalidOid;
amroutine->ambuild = btbuild;
amroutine->ambuildempty = btbuildempty;
amroutine->aminsert = btinsert;
amroutine->ambulkdelete = btbulkdelete;
amroutine->amvacuumcleanup = btvacuumcleanup;
amroutine->amcanreturn = btcanreturn;
amroutine->amcostestimate = btcostestimate;
amroutine->amoptions = btoptions;
amroutine->amproperty = btproperty;
amroutine->ambuildphasename = btbuildphasename;
amroutine->amvalidate = btvalidate;
amroutine->amadjustmembers = btadjustmembers;
amroutine->ambeginscan = btbeginscan;
amroutine->amrescan = btrescan;
amroutine->amgettuple = btgettuple;
amroutine->amgetbitmap = btgetbitmap;
amroutine->amendscan = btendscan;
amroutine->ammarkpos = btmarkpos;
amroutine->amrestrpos = btrestrpos;
amroutine->amestimateparallelscan = btestimateparallelscan;
amroutine->aminitparallelscan = btinitparallelscan;
amroutine->amparallelrescan = btparallelrescan;
PG_RETURN_POINTER(amroutine);
}
我們首先來看索引的基本操作,查詢btgettuple,插入btinsert和刪除。
索引查詢
索引查詢的調(diào)用棧
- ExecIndexScan
在執(zhí)行計劃中會有索引查詢的節(jié)點,如ExecIndexScan, 發(fā)起索引查詢,通過索引查找到數(shù)據(jù)表的tuple;
- -> IndexNext
返回數(shù)據(jù)表的tuple, 如果是稀疏索引,此處會進(jìn)行二次查找;
- -> index_getnext_slot
返回數(shù)據(jù)表的tuple,此處會使用索引找到的tid,在數(shù)據(jù)表中查找,并檢查可見性,如果不可見,那繼續(xù)查找下一條;
- -> index_getnext_tid
返回索引鍵中的記錄的tid;
- ->btgettuple
在索引中查找, 通過遍歷比較,命中查找鍵對應(yīng)的索引項
查找索引數(shù)據(jù)的基本流程
索引的查找大致分為兩個步驟:
- 找到起始點,也就是查找鍵值
- 從起始點開始掃描,返回符合條件的索引項
代碼分析
索引的查詢?nèi)肟诤瘮?shù)是 btgettuple,下面是它的實現(xiàn);
bool
btgettuple(IndexScanDesc scan, ScanDirection dir)
{
BTScanOpaque so = (BTScanOpaque) scan->opaque;
bool res;
/* btree indexes are never lossy */
scan->xs_recheck = false;
/*
* If we have any array keys, initialize them during first call for a
* scan. We can't do this in btrescan because we don't know the scan
* direction at that time.
*/
if (so->numArrayKeys && !BTScanPosIsValid(so->currPos))
{
/* punt if we have any unsatisfiable array keys */
if (so->numArrayKeys < 0)
return false;
_bt_start_array_keys(scan, dir);
}
/* This loop handles advancing to the next array elements, if any */
do
{
/*
* If we've already initialized this scan, we can just advance it in
* the appropriate direction. If we haven't done so yet, we call
* _bt_first() to get the first item in the scan.
*/
if (!BTScanPosIsValid(so->currPos))
res = _bt_first(scan, dir);
else
{
/*
* Check to see if we should kill the previously-fetched tuple.
*/
if (scan->kill_prior_tuple)
{
/*
* Yes, remember it for later. (We'll deal with all such
* tuples at once right before leaving the index page.) The
* test for numKilled overrun is not just paranoia: if the
* caller reverses direction in the indexscan then the same
* item might get entered multiple times. It's not worth
* trying to optimize that, so we don't detect it, but instead
* just forget any excess entries.
*/
if (so->killedItems == NULL)
so->killedItems = (int *)
palloc(MaxTIDsPerBTreePage * sizeof(int));
if (so->numKilled < MaxTIDsPerBTreePage)
so->killedItems[so->numKilled++] = so->currPos.itemIndex;
}
/*
* Now continue the scan.
*/
res = _bt_next(scan, dir);
}
/* If we have a tuple, return it ... */
if (res)
break;
/* ... otherwise see if we have more array keys to deal with */
} while (so->numArrayKeys && _bt_advance_array_keys(scan, dir));
return res;
}
- 初始化查找點;從代碼來看,進(jìn)入循環(huán)后,先 BTScanPosIsValid(so->currPos) 判斷currPos是否有效,也就是查找點是否已經(jīng)初始化;如果沒有初始化,則調(diào)用 _bt_first 進(jìn)行初始化;
- 掃描索引項; 初始化查找點后,調(diào)用 _bt_next 獲取一條索引項數(shù)據(jù),找到有效索引后就會返回;
索引插入
索引插入調(diào)用棧
從insert來看,調(diào)用路徑如下
- ExecInsert
SQL insert語句的執(zhí)行入口函數(shù)
- -> ExecInsertIndexTuples
如果當(dāng)前表中建有索引,在表數(shù)據(jù)tuple插入后,調(diào)用此函數(shù)插入索引,有可能存在多個索引,循環(huán)對每個索引調(diào)用下級函數(shù)進(jìn)行插入;
- index_insert
索引插入的公共調(diào)用接口,實際調(diào)用對應(yīng)索引的插入定義接口;
- btinsert
btree索引插入的操作的入口函數(shù); 在此函數(shù)中,首先拼裝一個索引tuple,然后調(diào)用下級函數(shù)進(jìn)行插入;
- _bt_doinsert
執(zhí)行索引項的插入,會經(jīng)過查找位置,檢查唯一性,插入等一系列流程環(huán)節(jié);
索引插入的基本流程
索引插入的大體流程主要有以下環(huán)節(jié):
- 查找索引項插入的位置,因為btree是一個有序的樹,所以先要找到插入的位置,保持順序; 此時會與索引查詢類似,先初始化查找鍵,并找到查詢點;
- 唯一性約束的檢查,如果索引中屬性列都為NULL,是不進(jìn)行唯一性檢查的;
- 索引的插入環(huán)節(jié),調(diào)用_bt_insertonpg來完成,其中會有查找空閑空間,可能會索引分裂等;
代碼分析
索引插入的入函數(shù)是 btinsert,實際執(zhí)行是 _bt_doinsert,下面來看一下執(zhí)行的代碼流程;
bool
_bt_doinsert(Relation rel, IndexTuple itup,
IndexUniqueCheck checkUnique, bool indexUnchanged,
Relation heapRel)
{
bool is_unique = false;
BTInsertStateData insertstate;
BTScanInsert itup_key;
BTStack stack;
bool checkingunique = (checkUnique != UNIQUE_CHECK_NO);
/* we need an insertion scan key to do our search, so build one */
itup_key = _bt_mkscankey(rel, itup);
if (checkingunique)
{
if (!itup_key->anynullkeys)
{
/* No (heapkeyspace) scantid until uniqueness established */
itup_key->scantid = NULL;
}
else
{
checkingunique = false;
/* Tuple is unique in the sense that core code cares about */
Assert(checkUnique != UNIQUE_CHECK_EXISTING);
is_unique = true;
}
}
insertstate.itup = itup;
insertstate.itemsz = MAXALIGN(IndexTupleSize(itup));
insertstate.itup_key = itup_key;
insertstate.bounds_valid = false;
insertstate.buf = InvalidBuffer;
insertstate.postingoff = 0;
search:
stack = _bt_search_insert(rel, heapRel, &insertstate);
if (checkingunique)
{
TransactionId xwait;
uint32 speculativeToken;
xwait = _bt_check_unique(rel, &insertstate, heapRel, checkUnique,
&is_unique, &speculativeToken);
if (unlikely(TransactionIdIsValid(xwait)))
{
/* Have to wait for the other guy ... */
_bt_relbuf(rel, insertstate.buf);
insertstate.buf = InvalidBuffer;
if (speculativeToken)
SpeculativeInsertionWait(xwait, speculativeToken);
else
XactLockTableWait(xwait, rel, &itup->t_tid, XLTW_InsertIndex);
/* start over... */
if (stack)
_bt_freestack(stack);
goto search;
}
/* Uniqueness is established -- restore heap tid as scantid */
if (itup_key->heapkeyspace)
itup_key->scantid = &itup->t_tid;
}
if (checkUnique != UNIQUE_CHECK_EXISTING)
{
OffsetNumber newitemoff;
CheckForSerializableConflictIn(rel, NULL, BufferGetBlockNumber(insertstate.buf));
newitemoff = _bt_findinsertloc(rel, &insertstate, checkingunique,
indexUnchanged, stack, heapRel);
_bt_insertonpg(rel, heapRel, itup_key, insertstate.buf, InvalidBuffer,
stack, itup, insertstate.itemsz, newitemoff,
insertstate.postingoff, false);
}
else
{
/* just release the buffer */
_bt_relbuf(rel, insertstate.buf);
}
/* be tidy */
if (stack)
_bt_freestack(stack);
pfree(itup_key);
return is_unique;
}
代碼流程如下:
- 初始化工作; 初始化查找鍵;
- 查找插入位置; 調(diào)用 _bt_search_insert 進(jìn)行查詢到一個有足夠空閑空間的葉子節(jié)點page;
- 檢查唯一性約束;檢查唯一性約束,如果有沖突事務(wù),則等待沖突事務(wù)執(zhí)行完成后,再重新查詢位置,再檢查唯一性約束;然后對結(jié)果的判斷checkUnique != UNIQUE_CHECK_EXISTING,如果違返那么插入結(jié)束;否則執(zhí)行插入動作;
- 索引插入;先確定插入位置,再調(diào)用_bt_insertonpg;
索引刪除
索引的更新,就是刪除和插入操作,這里我們來看一下索引刪除的概要流程。
對于數(shù)據(jù)表的tuple的刪除,數(shù)據(jù)并沒有真實刪除,所以對應(yīng)的索引項也不會刪除,那么什么時候刪除索引項呢?
刪除索引基本流程
在進(jìn)行vacuum 或進(jìn)行 prune paga時,對于HOT鏈都會在每個page上留下最后一個數(shù)據(jù)元組,因為同一個page內(nèi)的HOT鏈只對應(yīng)一個索引項,留下這最后一個也是為了刪除索引項。
當(dāng)進(jìn)行vacuum 索引時,就會通過這個dead tuple找到對應(yīng)的索引項,先刪除索引項,再刪除dead tuple。
常常說索引的性能下降了,其實就是索引膨脹導(dǎo)致,也就是deadtuple變多,導(dǎo)致待刪除索引項變多,查詢效率大降低,同時也會帶來索引IO的增加。
代碼分析
- vac_bulkdel_one_index
調(diào)用 通用索引處理接口;
- ->index_bulk_delete
這里通用索引處理接口,其中調(diào)用對應(yīng)索引的處理接口,這里是調(diào)用btree索引處理;
- ->btbulkdelete
btree對應(yīng)的批量刪除接口; 避免退出的影響,在開始時會注冊退出的回調(diào)函數(shù),在解除共享內(nèi)存前處理善后;然后調(diào)用 btvacuumscan 對所有page進(jìn)行索引刪除清理。
結(jié)尾
非常感謝大家的支持,在瀏覽的同時別忘了留下您寶貴的評論,如果覺得值得鼓勵,請點贊,收藏,我會更加努力!
作者郵箱:study@senllang.onaliyun.com
如有錯誤或者疏漏歡迎指出,互相學(xué)習(xí)。文章來源:http://www.zghlxwxcb.cn/news/detail-678979.html
注:未經(jīng)同意,不得轉(zhuǎn)載!文章來源地址http://www.zghlxwxcb.cn/news/detail-678979.html
到了這里,關(guān)于postgresql 內(nèi)核源碼分析 btree索引的增刪查代碼基本原理流程分析,索引膨脹的原因在這里的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!