參考:Elasticsearch Search Scroll API(滾動查詢) - 簡書
Elasticsearch 中,傳統(tǒng)的分頁查詢使用from+size
的模式,from
就是頁碼,從 0 開始。默認(rèn)情況下,當(dāng)(from+1)*size
大于 10000 時,也就是已查詢的總數(shù)據(jù)量大于 10000 時,會出現(xiàn)異常。
如下,用循環(huán)模擬一個連續(xù)分頁查詢:
public void search() {
// 記錄頁碼
int page = 0;
// 記錄已經(jīng)查詢到總數(shù)據(jù)量
long total = 0;
while (true) {
NativeSearchQuery nativeSearchQuery = new NativeSearchQueryBuilder()
// 設(shè)置分頁
.withPageable(PageRequest.of(page, 1000))
.withSort(new FieldSortBuilder("commentCount").order(SortOrder.DESC))
.build();
SearchHits<Book> searchHits = elasticsearchRestTemplate.search(nativeSearchQuery, Book.class);
if (!searchHits.hasSearchHits()) {
break;
}
for (SearchHit<Book> searchHit : searchHits.getSearchHits()) {
Book book = searchHit.getContent();
}
page++;
System.out.println(page);
System.out.println(total += searchHits.getSearchHits().size());
}
}
最終當(dāng) page 等于 10 時會拋出如下異常:
Caused by: ElasticsearchException[Elasticsearch exception [type=illegal_argument_exception, reason=Result window is too large, from + size must be less than or equal to: [10000] but was [11000]. See the scroll api for a more efficient way to request large data sets. This limit can be set by changing the [index.max_result_window] index level setting.]]
從異常信息中,我們可以發(fā)現(xiàn)官方給我們提供了兩種方案來解決這個問題:
1、max_result_window
- 將 Elasticsearch 配置參數(shù)
index.max_result_window
修改為大于 100000 的值,對應(yīng)的 RESTful API 如下:
PUT book/_settings
{
"index": {
"max_result_window": 1000000
}
}
雖然可以通過修改index.max_result_window
來解決查詢時數(shù)據(jù)量的限制,但是這并不是不推薦的做法,當(dāng)數(shù)據(jù)量達(dá)到百萬、千萬級別時,使用from+size
模式查詢時性能會越來越差,每次查詢的耗時也會越來越久,嚴(yán)重影響體驗,同時對 CPU 和內(nèi)存的消耗也很大的。
2、scroll api
如果需要查詢大量的數(shù)據(jù),可以考慮使用 Search Scroll API,這是一種更加高效的方式。
如果直接使用 Java Client,可以參考官方的 API 文檔:
https://www.elastic.co/guide/en/elasticsearch/client/java-rest/7.9/java-rest-high-search-scroll.html
我們這里還是和 SpringBoot 整合去使用,其實(shí)核心的用法都是很類似的。如下同樣模擬一個連分頁查詢:
public void scrollSearch() {
NativeSearchQuery nativeSearchQuery = new NativeSearchQueryBuilder()
.withSort(new FieldSortBuilder("commentCount").order(SortOrder.DESC))
.build();
// 設(shè)置每頁數(shù)據(jù)量
nativeSearchQuery.setMaxResults(1000);
long scrollTimeInMillis = 60 * 1000;
// 第一次查詢
SearchScrollHits<Book> searchScrollHits = elasticsearchRestTemplate.searchScrollStart(scrollTimeInMillis, nativeSearchQuery, Book.class, IndexCoordinates.of("book"));
String scrollId = searchScrollHits.getScrollId();
while (searchScrollHits.hasSearchHits()) {
System.out.println(total += searchScrollHits.getSearchHits().size());
for (SearchHit<Book> searchHit : searchScrollHits.getSearchHits()) {
Book book = searchHit.getContent();
}
// 后續(xù)查詢
searchScrollHits = elasticsearchRestTemplate.searchScrollContinue(scrollId, scrollTimeInMillis, Book.class, IndexCoordinates.of("book"));
scrollId = searchScrollHits.getScrollId();
}
List<String> scrollIds = new ArrayList<>();
scrollIds.add(scrollId);
// 清除 scroll
elasticsearchRestTemplate.searchScrollClear(scrollIds);
}
以下幾點(diǎn)需要注意:文章來源:http://www.zghlxwxcb.cn/news/detail-418916.html
-
setMaxResults(1000)
用來設(shè)置查詢時每頁的數(shù)據(jù)量,我這里使用 Elasticsearch7.9 有這個方法,如果其它舊版本沒有這個方法,可以使用PageRequest.of(0, 1000)
來設(shè)置,注意頁碼要為 0。 - 第一次查詢使用
searchScrollStart()
,后續(xù)查詢使用searchScrollContinue()
,查詢結(jié)果中都攜帶了一個scrollId
。 - 除了第一次查詢外,后續(xù)的查詢都需要攜帶
scrollId
,可以理解為游標(biāo)
,用它來控制分頁。和from+size
模式中頁碼是一個作用。 -
scrollTimeInMillis
,表示查詢結(jié)果中scrollId
的有效時間,單位毫秒,可根據(jù)實(shí)際情況設(shè)置。 - 查詢結(jié)束后,需要使用
searchScrollClear()
清除 scroll。 - 在
from+size
分頁查詢模式中,我們可以指定任意合理的頁碼,實(shí)現(xiàn)跳頁查詢;但使用scroll api
就無法實(shí)現(xiàn)跳頁查詢了,因為除了第一次查詢外的其它查詢都要依賴上一次查詢返回的scrollId
,這一點(diǎn)需要注意。
原文中可能會空查一次,少許修改代碼,如下:文章來源地址http://www.zghlxwxcb.cn/news/detail-418916.html
void searchScroll(){
NativeSearchQuery query = new NativeSearchQuery(QueryBuilders.matchAllQuery());
query.setMaxResults(1);//設(shè)置每頁數(shù)據(jù)量
query.addSort(Sort.by(Sort.Direction.DESC,"age"));
long scrollTimeInMillis=5_000;
long currentTotal=0;
int pageNo=1;
List<String> scrollIdList = new ArrayList<>();
//scroll一共有三個方法:searchScrollStart(第一次查詢)、searchScrollContinue(第二次到最后一次)、searchScrollClear(查詢完成后執(zhí)行)
//第一次查詢使用:searchScrollStart
SearchScrollHits<People> searchScrollHits = this.elasticsearchRestTemplate.searchScrollStart(scrollTimeInMillis, query, People.class, IndexCoordinates.of("people_index"));
String scrollId = searchScrollHits.getScrollId();
scrollIdList.add(scrollId);
System.out.println("scrollId:"+scrollId);
long totalHits = searchScrollHits.getTotalHits();
currentTotal=searchScrollHits.getSearchHits().size();
System.out.println("totalHits:"+totalHits);
List<People> list = searchScrollHits.get().map(SearchHit::getContent).collect(Collectors.toList());
System.out.println("============pageNo:==========="+pageNo);
for (People people : list) {
System.out.println(people);
}
while (currentTotal<totalHits){
SearchScrollHits<People> searchScrollHitsContinue = elasticsearchRestTemplate.searchScrollContinue(scrollId, scrollTimeInMillis, People.class, IndexCoordinates.of("people_index"));
scrollId=searchScrollHitsContinue.getScrollId();
scrollIdList.add(scrollId);
pageNo++;
if(searchScrollHitsContinue.hasSearchHits()){
currentTotal+=searchScrollHitsContinue.getSearchHits().size();
List<People> peopleList = searchScrollHitsContinue.get().map(SearchHit::getContent).collect(Collectors.toList());
System.out.println("============pageNo:==========="+pageNo);
for (People people : peopleList) {
System.out.println(people);
}
}else{
System.out.println("============pageNo not hasSearchHits===========");
break;
}
}
System.out.println(scrollIdList);
elasticsearchRestTemplate.searchScrollClear(scrollIdList);
}
到了這里,關(guān)于Elasticsearch Search Scroll API(滾動查詢)的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!