目錄
概述
__unmap_and_move函數(shù)
step1:?Lock the page to be migrated
step2:??Insure that writeback is complete.
step3:?Lock the new page that we want to move to.?
step4:??All the page table references to the page are converted to migration entries.
step5-step15:?move_to_new_page
step 5- step11
step 12- 15
step 16-18
概述
Linux 內(nèi)核page migration設(shè)計(jì)文檔_nginux的博客-CSDN博客
前文介紹了page migration遷移的設(shè)計(jì)思路,且內(nèi)核文檔介紹得知,總計(jì)需要18個(gè)steps完成遷移過程,本文目標(biāo)是以源碼視角對(duì)應(yīng)到這18個(gè)steps上面,加深對(duì)于頁面遷移的理解。內(nèi)核源碼版本:Linux-5.9。下面再貼一下內(nèi)核描述的18個(gè)steps:
?migrate_pages基本調(diào)用順序和示意圖:
migrate_pages
--->unmap_and_move
--->__unmap_and_move
__unmap_and_move函數(shù)
//page:需要遷移的page
//newPage: 遷移的目標(biāo)page
static int __unmap_and_move(struct page *page, struct page *newpage,
int force, enum migrate_mode mode)
{
int rc = -EAGAIN;
int page_was_mapped = 0;
struct anon_vma *anon_vma = NULL;
bool is_lru = !__PageMovable(page);
//step1:Lock the page to be migrated
//加鎖失敗的情況下,有些條件下就不陷入睡眠等待了
if (!trylock_page(page)) {
//!force或者Async異步遷移,不再lock_page,因?yàn)闀?huì)sleep。
if (!force || mode == MIGRATE_ASYNC)
goto out;
/*
* It's not safe for direct compaction to call lock_page.
* For example, during page readahead pages are added locked
* to the LRU. Later, when the IO completes the pages are
* marked uptodate and unlocked. However, the queueing
* could be merging multiple pages for one bio (e.g.
* mpage_readahead). If an allocation happens for the
* second or third page, the process can end up locking
* the same page twice and deadlocking. Rather than
* trying to be clever about what pages can be locked,
* avoid the use of lock_page for direct compaction
* altogether.
*/
//對(duì)于設(shè)置PF_MEMALLOC的direct compaction或者kswapd不要強(qiáng)行加鎖,否則可能deadlock.
if (current->flags & PF_MEMALLOC)
goto out;
lock_page(page);
}
//step2:Insure that writeback is complete.
if (PageWriteback(page)) {
/*
* Only in the case of a full synchronous migration is it
* necessary to wait for PageWriteback. In the async case,
* the retry loop is too short and in the sync-light case,
* the overhead of stalling is too much
*/
//SYNC模式意味著可以等待writeback完成,等待是通過wait_on_page_writeback(page)實(shí)現(xiàn)。
switch (mode) {
case MIGRATE_SYNC:
case MIGRATE_SYNC_NO_COPY:
break;
default:
//否則就意味者此page遷移失敗
rc = -EBUSY;
goto out_unlock;
}
if (!force)
goto out_unlock;
wait_on_page_writeback(page);
}
/*
* By try_to_unmap(), page->mapcount goes down to 0 here. In this case,
* we cannot notice that anon_vma is freed while we migrates a page.
* This get_anon_vma() delays freeing anon_vma pointer until the end
* of migration. File cache pages are no problem because of page_lock()
* File Caches may use write_page() or lock_page() in migration, then,
* just care Anon page here.
*
* Only page_get_anon_vma() understands the subtleties of
* getting a hold on an anon_vma from outside one of its mms.
* But if we cannot get anon_vma, then we won't need it anyway,
* because that implies that the anon page is no longer mapped
* (and cannot be remapped so long as we hold the page lock).
*/
if (PageAnon(page) && !PageKsm(page))
anon_vma = page_get_anon_vma(page);
/*
* Block others from accessing the new page when we get around to
* establishing additional references. We are usually the only one
* holding a reference to newpage at this point. We used to have a BUG
* here if trylock_page(newpage) fails, but would like to allow for
* cases where there might be a race with the previous use of newpage.
* This is much like races on refcount of oldpage: just don't BUG().
*/
//step3 : Lock the new page that we want to move to
if (unlikely(!trylock_page(newpage)))
goto out_unlock;
if (unlikely(!is_lru)) {
rc = move_to_new_page(newpage, page, mode);
goto out_unlock_both;
}
/*
* Corner case handling:
* 1. When a new swap-cache page is read into, it is added to the LRU
* and treated as swapcache but it has no rmap yet.
* Calling try_to_unmap() against a page->mapping==NULL page will
* trigger a BUG. So handle it here.
* 2. An orphaned page (see truncate_complete_page) might have
* fs-private metadata. The page can be picked up due to memory
* offlining. Everywhere else except page reclaim, the page is
* invisible to the vm, so the page can not be migrated. So try to
* free the metadata, so the page can be freed.
*/
//如注釋這是corner case處理
if (!page->mapping) {
VM_BUG_ON_PAGE(PageAnon(page), page);
if (page_has_private(page)) {
try_to_free_buffers(page);
goto out_unlock_both;
}
} else if (page_mapped(page)) {
//step4 : All the page table references to the page are converted to migration
//entries.
/* Establish migration ptes */
VM_BUG_ON_PAGE(PageAnon(page) && !PageKsm(page) && !anon_vma,
page);
try_to_unmap(page,
TTU_MIGRATION|TTU_IGNORE_MLOCK|TTU_IGNORE_ACCESS);
page_was_mapped = 1;
}
//不管是本來就沒有mapped(read/write cache),還是try_to_unmap后不再mapped
//都調(diào)用move_to_new_page,很重要的函數(shù),20個(gè)步驟的第
if (!page_mapped(page))
//step5-step15
rc = move_to_new_page(newpage, page, mode);
//step 16
if (page_was_mapped)
remove_migration_ptes(page,
rc == MIGRATEPAGE_SUCCESS ? newpage : page, false);
//step 17
out_unlock_both:
unlock_page(newpage);
out_unlock:
/* Drop an anon_vma reference if we took one */
if (anon_vma)
put_anon_vma(anon_vma);
unlock_page(page);
out:
/*
* If migration is successful, decrease refcount of the newpage
* which will not free the page because new page owner increased
* refcounter. As well, if it is LRU page, add the page to LRU
* list in here. Use the old state of the isolated source page to
* determine if we migrated a LRU page. newpage was already unlocked
* and possibly modified by its owner - don't rely on the page
* state.
*/
if (rc == MIGRATEPAGE_SUCCESS) {
if (unlikely(!is_lru))
put_page(newpage);
else
//step 18
putback_lru_page(newpage);
}
return rc;
}
step1:?Lock the page to be migrated
如果加鎖失敗,有些情況下就不能強(qiáng)行l(wèi)ock_page,因?yàn)闀?huì)陷入sleep:
1.?MIGRATE_ASYNC異步遷移模式。
因?yàn)?span style="color:#fe2c24;">MIGRATE_ASYNC異步模式根據(jù)內(nèi)核定義是不能阻塞的:
MIGRATE_SYNC_LIGHT: 可以接受一定程度的block,但是不能因?yàn)閣ritepage回寫block,因?yàn)檫@個(gè)block時(shí)間可能很大,不符合“LIGHT"的定義初衷。
MIGRATE_SYNC: 可以隨意block.
MIGRATE_SYNC_NO_COPY: 可以block,但是不能執(zhí)行CPU的page賦值操作(DMA是可以的)。?
2.PF_MEMALLOC 設(shè)置的進(jìn)程,主要是direct compaction或者kswapd,避免死鎖。
step2:??Insure that writeback is complete.
是否等待writeback要看遷移模式:
MIGRATE_SYNC和MIGRATE_SYNC_NO_COPY: 兩種遷移模式block等待writeback。
其他模式下不會(huì)等待此page writeback,意味著該page遷移失敗。
step3:?Lock the new page that we want to move to.?
使用trylock_page嘗試加鎖新page,如果失敗也不BUG。
step4:??All the page table references to the page are converted to migration entries.
對(duì)于mapped的page調(diào)用try_to_unmap解除映射,注意:
- 并不是所有可遷移頁面都是page_mapped。比如read/write產(chǎn)生的page cache
- try_to_unmap會(huì)將page的所有映射解除。比如page被3個(gè)進(jìn)程共享,那么3個(gè)映射全部解除。
- try_to_unmap會(huì)將page pte設(shè)置成migration entry。這樣的好處就是,進(jìn)程繼續(xù)訪問unmap的page之后會(huì)觸發(fā)page fault,并且內(nèi)核根據(jù)pte發(fā)現(xiàn)是migration的就等待遷移完成,自然就完成了缺頁處理。
step5-step15:?move_to_new_page
5-15步驟封裝在move_to_new_page函數(shù)實(shí)現(xiàn),需要注意對(duì)于不同類型的page處理流程上并不相同,以Anon Page為例,由于page->mapping為null,故調(diào)用migrate_page函數(shù):
/*
* Move a page to a newly allocated page
* The page is locked and all ptes have been successfully removed.
*
* The new page will have replaced the old page if this function
* is successful.
*
* Return value:
* < 0 - error code
* MIGRATEPAGE_SUCCESS - success
*/
static int move_to_new_page(struct page *newpage, struct page *page,
enum migrate_mode mode)
{
struct address_space *mapping;
int rc = -EAGAIN;
bool is_lru = !__PageMovable(page);
...
mapping = page_mapping(page);
if (likely(is_lru)) {
if (!mapping)
rc = migrate_page(mapping, newpage, page, mode);
else if (mapping->a_ops->migratepage)
/*
* Most pages have a mapping and most filesystems
* provide a migratepage callback. Anonymous pages
* are part of swap space which also has its own
* migratepage callback. This is the most common path
* for page migration.
*/
rc = mapping->a_ops->migratepage(mapping, newpage,
page, mode);
else
rc = fallback_migrate_page(mapping, newpage,
page, mode);
}
...
/*
* When successful, old pagecache page->mapping must be cleared before
* page is freed; but stats require that PageAnon be left as PageAnon.
*/
if (rc == MIGRATEPAGE_SUCCESS) {
if (__PageMovable(page)) {
VM_BUG_ON_PAGE(!PageIsolated(page), page);
/*
* We clear PG_movable under page_lock so any compactor
* cannot try to migrate this page.
*/
__ClearPageIsolated(page);
}
/*
* Anonymous and movable page->mapping will be cleared by
* free_pages_prepare so don't reset it here for keeping
* the type to work PageAnon, for example.
*/
if (!PageMappingFlags(page))
page->mapping = NULL;
if (likely(!is_zone_device_page(newpage)))
flush_dcache_page(newpage);
}
out:
return rc;
}
int migrate_page(struct address_space *mapping,
struct page *newpage, struct page *page,
enum migrate_mode mode)
{
int rc;
BUG_ON(PageWriteback(page)); /* Writeback must be complete */
//step 5 - 11
rc = migrate_page_move_mapping(mapping, newpage, page, 0);
if (rc != MIGRATEPAGE_SUCCESS)
return rc;
//step 12 - 15
if (mode != MIGRATE_SYNC_NO_COPY)
migrate_page_copy(newpage, page);
else
migrate_page_states(newpage, page);
return MIGRATEPAGE_SUCCESS;
}
step 5- step11
這里我們已a(bǔ)non page調(diào)用的migrate_page為例:
step5:?The i_pages lock is taken,使用i_pages加鎖保護(hù)radix tree。加鎖之后所有訪問radix tree上該page的進(jìn)程會(huì)被block,所以操作完應(yīng)該盡快解鎖。
step6:The refcount of the page is examined and we back out if references remain
? ?otherwise we know that we are the only one referencing this page.檢查page->_refcount,如果只有我們引用該page就繼續(xù)遷移,否則退出。
step7:The radix tree is checked and if it does not contain the pointer to this
? ?page then we back out because someone else modified the radix tree.
檢查radix tree是否包含被遷移的page,如果是anon page就簡單了,因?yàn)閍non page本就不在radix tree當(dāng)中。而在radix-tree中的page就要更新其指向到新的page。
step 8:?The new page is prepped with some settings from the old page so that
? ?accesses to the new page will discover a page with the correct settings.
準(zhǔn)備新page,主要是設(shè)置new page的index, mapping,__SetPageSwapBacked,SetPageDirty等。
step9 :?The radix tree is changed to point to the new page.
radix tree更新指向到新page
step 10:?he reference count of the old page is dropped,由于老的page從radix tree,故refcount減少1.
step 11:?The radix tree lock is dropped,radix tree指向更新到新page已完成,故可以釋放鎖了。
/*
* Replace the page in the mapping.
*
* The number of remaining references must be:
* 1 for anonymous pages without a mapping
* 2 for pages with a mapping
* 3 for pages with a mapping and PagePrivate/PagePrivate2 set.
*/
int migrate_page_move_mapping(struct address_space *mapping,
struct page *newpage, struct page *page, int extra_count)
{
XA_STATE(xas, &mapping->i_pages, page_index(page));
struct zone *oldzone, *newzone;
int dirty;
int expected_count = expected_page_refs(mapping, page) + extra_count;
//匿名頁比較簡單,因?yàn)椴辉趓adix tree中,直接更新newpage就好
if (!mapping) {
/* Anonymous page without mapping */
if (page_count(page) != expected_count)
return -EAGAIN;
/* No turning back from here */
newpage->index = page->index;
newpage->mapping = page->mapping;
if (PageSwapBacked(page))
__SetPageSwapBacked(newpage);
return MIGRATEPAGE_SUCCESS;
}
oldzone = page_zone(page);
newzone = page_zone(newpage);
//step5:加鎖
xas_lock_irq(&xas);
//step6-7:refcount檢查,以及檢查page是否在radix tree中
if (page_count(page) != expected_count || xas_load(&xas) != page) {
xas_unlock_irq(&xas);
return -EAGAIN;
}
if (!page_ref_freeze(page, expected_count)) {
xas_unlock_irq(&xas);
return -EAGAIN;
}
/*
* Now we know that no one else is looking at the page:
* no turning back from here.
*/
//step8: 準(zhǔn)備新page
newpage->index = page->index;
newpage->mapping = page->mapping;
page_ref_add(newpage, thp_nr_pages(page)); /* add cache reference */
if (PageSwapBacked(page)) {
__SetPageSwapBacked(newpage);
if (PageSwapCache(page)) {
SetPageSwapCache(newpage);
set_page_private(newpage, page_private(page));
}
} else {
VM_BUG_ON_PAGE(PageSwapCache(page), page);
}
/* Move dirty while page refs frozen and newpage not yet exposed */
dirty = PageDirty(page);
if (dirty) {
ClearPageDirty(page);
SetPageDirty(newpage);
}
//step9:radix tree指向新page
xas_store(&xas, newpage);
if (PageTransHuge(page)) {
int i;
for (i = 1; i < HPAGE_PMD_NR; i++) {
xas_next(&xas);
xas_store(&xas, newpage);
}
}
/*
* Drop cache reference from old page by unfreezing
* to one less reference.
* We know this isn't the last reference.
*/
//step10:老的page的refcount要相應(yīng)減少,因?yàn)橐呀?jīng)從radix tree刪除了
page_ref_unfreeze(page, expected_count - thp_nr_pages(page));
//step11: radix tree操作完成,解鎖了
xas_unlock(&xas);
...
local_irq_enable();
return MIGRATEPAGE_SUCCESS;
}
step 12- 15
步驟12-15調(diào)用migrate_page_copy完成,有個(gè)特殊情況時(shí)MIGRATE_SYNC_NO_COPY按照內(nèi)核定義不允許cpu copy,所以就不copy page的內(nèi)容,只是將狀態(tài)遷移一下。文章來源:http://www.zghlxwxcb.cn/news/detail-677659.html
step 16-18
按照源碼和內(nèi)核文檔對(duì)應(yīng)關(guān)系比較好理解,沒有什么特別需要解釋的。文章來源地址http://www.zghlxwxcb.cn/news/detail-677659.html
到了這里,關(guān)于Linux page migration源碼分析的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!