一、前言
前面我們已經(jīng)將ES的基礎(chǔ)操作(索引,映射,文檔)學(xué)習(xí)過了,從這一章開始,我們便開始學(xué)習(xí)ES的最大的功能—搜索
ES為用戶提供了豐富的搜索功能:既有基本的搜索功能,又有搜索建議功能;既有常用的普通類型的匹配功能,又有基于地理位置的搜索功能;既提供了分頁搜索功能,又提供了搜索的調(diào)試分析功能等等。這些都會(huì)在這一大章中學(xué)習(xí)到。但是考慮到搜索涉及到的章節(jié)確實(shí)非常多,于是我仍然像之前基礎(chǔ)操作一樣,拆解成一些章節(jié)供大家更容易吸收學(xué)習(xí)
那么這一節(jié)我們主要學(xué)習(xí)ES的搜索輔助功能。例如,為優(yōu)化搜索功能,需要指定搜索的一部分字段內(nèi)容。為了更好地呈現(xiàn)結(jié)果,需要用到結(jié)果計(jì)數(shù)和分頁功能;當(dāng)遇到性能瓶頸時(shí),需要剖析搜索各個(gè)環(huán)節(jié)的耗時(shí);面對(duì)不符合預(yù)期的搜索結(jié)果時(shí),需要分析各個(gè)文檔的評(píng)分細(xì)節(jié)。
二、指定搜索返回字段
考慮到性能問題,需要對(duì)搜索結(jié)果進(jìn)行“瘦身”----指定返回搜索字段。在ES中,通過_source子句可以設(shè)定返回結(jié)果的字段。_source指向一個(gè)JSON數(shù)組,數(shù)組中的元素是希望返回的字段名稱。
在此之前,為了后面的學(xué)習(xí),我們需要將hotel的索引徹底換一下,這里推薦大家先刪除hotel索引,然后重新建立Hotel索引及映射關(guān)系,然后通過bulk批量插入值:
刪除hotel索引后定義hotel索引的結(jié)構(gòu)DSL如下:
DELETE /hotel
PUT /hotel
{
"mappings": {
"properties": {
"title": {
"type": "text"
},
"city": {
"type": "keyword"
},
"price": {
"type": "double"
},
"create_time": {
"type": "date",
"format": "yyyy-MM-dd HH:mm:ss"
},
"amenities": {
"type": "text"
},
"full_room": {
"type": "boolean"
},
"location": {
"type": "geo_point"
},
"praise": {
"type": "integer"
}
}
}
}
然后在索引中批量新增如下數(shù)據(jù):
POST /_bulk
{"index":{"_index":"hotel","_id":"001"}}
{"title":"文雅酒店","city":"北京","price":"558.00","create_time":"2020-03-29 21:00:00","amenities":"浴池,普通停車場(chǎng)/充電停車場(chǎng)","full_room":true,"location":{"lat":36.940243,"lon":120.39400},"praise":10}
{"index":{"_index":"hotel","_id":"002"}}
{"title":"京盛酒店","city":"北京","price":"337.00","create_time":"2020-07-29 13:00:00","amenities":"充電停車場(chǎng)/可升降停車場(chǎng)","full_room":false,"location":{"lat":39.911543,"lon":116.4030},"praise":60}
{"index":{"_index":"hotel","_id":"003"}}
{"title":"文雅文化酒店","city":"天津","price":"260.00","create_time":"2021-02-27 22:00:00","amenities":"提供假日party,免費(fèi)早餐,浴池,充電停車場(chǎng)","full_room":true,"location":{"lat":39.186555,"lon":117.162767},"praise":30}
{"index":{"_index":"hotel","_id":"004"}}
{"title":"京盛集團(tuán)酒店","city":"上海","price":"800.00","create_time":"2021-05-29 21:35:00","amenities":"浴池(假日需預(yù)訂),室內(nèi)游泳池,普通停車場(chǎng)/充電停車場(chǎng)","full_room":true,"location":{"lat":36.940243,"lon":120.39400},"praise":100}
{"index":{"_index":"hotel","_id":"005"}}
{"title":"京盛精選酒店","city":"南昌","price":"300.00","create_time":"2021-07-29 22:50:00","amenities":"室內(nèi)游泳池,普通停車場(chǎng)","full_room":false,"location":{"lat":39.918229,"lon":116.422011},"praise":20}
下面的DSL指定搜索結(jié)果只返回title和city字段:
GET /hotel/_search
{
"_source": ["title","city"],
"query": {
"term": {
"city": {
"value": "北京"
}
}
}
}
執(zhí)行上述DSL后,搜索結(jié)果如下:
在上述搜索結(jié)果中,每個(gè)命中文檔的_source結(jié)構(gòu)體中只包含指定的city和title兩個(gè)字段的數(shù)據(jù)。
在Java客戶端中,通過調(diào)用searchSourceBuilder.fetchSource()方法可以設(shè)定搜索返回的字段,該方法接收兩個(gè)參數(shù),即需要的字段數(shù)組和不需要的字段數(shù)組。
我們先在service創(chuàng)建一個(gè)搜索接口,并且設(shè)定只返回title,city兩個(gè)字段:
public List<Hotel> queryBySource(HotelDocRequest hotelDocRequest) throws IOException {
String indexName = hotelDocRequest.getIndexName();
if (CharSequenceUtil.isBlank(indexName)) {
throw new SearchException("索引名不能為空");
}
Hotel hotel = hotelDocRequest.getHotel();
if (ObjectUtil.isEmpty(hotel)) {
throw new SearchException("搜索條件不能為空");
}
SearchRequest searchRequest = new SearchRequest(indexName);
String city = hotel.getCity();
//創(chuàng)建搜索builder
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
//構(gòu)建query
searchSourceBuilder.query(new TermQueryBuilder("city",city));
//設(shè)定希望返回的字段數(shù)組
searchSourceBuilder.fetchSource(new String[]{"title","city"},null);
searchRequest.source(searchSourceBuilder);
ArrayList<Hotel> resultList = new ArrayList<>();
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
RestStatus status = searchResponse.status();
if (status != RestStatus.OK) {
return Collections.emptyList();
}
SearchHits searchHits = searchResponse.getHits();
for (SearchHit searchHit : searchHits) {
Hotel hotelResult = new Hotel();
hotelResult.setId(searchHit.getId()); //文檔_id
hotelResult.setIndex(searchHit.getIndex()); //索引名稱
hotelResult.setScore(searchHit.getScore()); //文檔得分
//轉(zhuǎn)換為Map
Map<String, Object> dataMap = searchHit.getSourceAsMap();
hotelResult.setTitle((String) dataMap.get("title"));
hotelResult.setCity((String) dataMap.get("city"));
resultList.add(hotelResult);
}
return resultList;
}
然后在controller中調(diào)用service接口:
@PostMapping("/query/source")
public FoundationResponse<String> queryHotelsBySource(@RequestBody HotelDocRequest hotelDocRequest) {
try {
List<Hotel> hotelList = esQueryService.queryBySource(hotelDocRequest);
if (CollUtil.isNotEmpty(hotelList)) {
return FoundationResponse.success(hotelList.toString());
} else {
return FoundationResponse.success("no data");
}
} catch (IOException e) {
log.warn("搜索發(fā)生異常,原因?yàn)?{}", e.getMessage());
return FoundationResponse.error(100, e.getMessage());
} catch (Exception e) {
log.error("服務(wù)發(fā)生異常,原因?yàn)?{}", e.getMessage());
return FoundationResponse.error(100, e.getMessage());
}
}
postman調(diào)用該接口:
三、結(jié)果計(jì)數(shù)
為提升搜索體驗(yàn),需要給前段傳遞搜索匹配結(jié)果的文檔條數(shù),即需要對(duì)搜索結(jié)果進(jìn)行計(jì)數(shù)。針對(duì)這個(gè)要求,ES提供了_count功能,在該API中,用戶提供query子句用于結(jié)果匹配,而ES會(huì)返回匹配的文檔條數(shù)。類似于RDBMS中的SELECT COUNT(*) FROM XXX WHERE XXX…
下面的DSL將返回城市為"北京"的酒店條數(shù):
GET /hotel/_count
{
"query": { //計(jì)數(shù)的查詢條件
"match": {
"city": "北京"
}
}
}
執(zhí)行上述DSL后,返回的信息如下:
由結(jié)果可知,ES不僅返回了匹配的文檔數(shù)量(值為2),并且還返回了和分片相關(guān)的元數(shù)據(jù),如總共掃描的分片個(gè)數(shù),以及成功、失敗、跳過的分片個(gè)數(shù)等。
在Java客戶端中,通過CountRequest執(zhí)行_count API,然后調(diào)用CountRequest對(duì)象的source()方法設(shè)置查詢邏輯。countRequest.source()方法返回CountResponse對(duì)象,通過countResponse.getCount()方法可以得到匹配的文檔條數(shù)。
我們首先在service層創(chuàng)建根據(jù)城市獲取搜索條數(shù)的API:
public long getCityCount(HotelDocRequest hotelDocRequest) throws IOException {
String indexName = hotelDocRequest.getIndexName();
if (CharSequenceUtil.isBlank(indexName)) {
throw new SearchException("索引名不能為空");
}
Hotel hotel = hotelDocRequest.getHotel();
if (ObjectUtil.isEmpty(hotel)) {
throw new SearchException("搜索條件不能為空");
}
//客戶端的count請(qǐng)求
CountRequest countRequest = new CountRequest(indexName);
String city = hotel.getCity();
//創(chuàng)建搜索builder
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
//構(gòu)建query
searchSourceBuilder.query(new TermQueryBuilder("city",city));
countRequest.source(searchSourceBuilder); //設(shè)置查詢
CountResponse countResponse = client.count(countRequest, RequestOptions.DEFAULT);
return countResponse.getCount();
}
然后controller調(diào)用service:
@PostMapping("/query/count")
public FoundationResponse<Long> queryCount(@RequestBody HotelDocRequest hotelDocRequest) {
try {
Long count = esQueryService.getCityCount(hotelDocRequest);
return FoundationResponse.success(count);
} catch (IOException e) {
log.warn("搜索發(fā)生異常,原因?yàn)?{}", e.getMessage());
return FoundationResponse.error(100, e.getMessage());
} catch (Exception e) {
log.error("服務(wù)發(fā)生異常,原因?yàn)?{}", e.getMessage());
return FoundationResponse.error(100, e.getMessage());
}
}
postman調(diào)用該接口:
四、結(jié)果分頁
在實(shí)際的搜索應(yīng)用中,分頁是必不可少的功能。在默認(rèn)情況下,ES返回前10個(gè)搜索匹配的文檔。用戶可以通過設(shè)置from和size來定義搜索位置和每頁顯示的文檔數(shù)量,from表示查詢結(jié)果的起始下標(biāo),默認(rèn)值為0,size表示從起始下標(biāo)開始返回的文檔個(gè)數(shù),默認(rèn)值為10.下面的DSL將返回下標(biāo)從0開始的20個(gè)結(jié)果:
GET /hotel/_search
{
"_source": ["title","city"],
"from": 0, //設(shè)置搜索的起始位置
"size": 20, //設(shè)置搜索返回的文檔個(gè)數(shù)
"query": { //搜索條件
"term": {
"city": {
"value": "北京"
}
}
}
}
在默認(rèn)情況下,用戶最多可以取得10000個(gè)文檔,即from為0時(shí),size參數(shù)最大為10000,如果該請(qǐng)求超過該值,ES返回如下報(bào)錯(cuò)信息:
對(duì)于普通的搜索應(yīng)用來說,size設(shè)為10000已經(jīng)足夠用了。如果確實(shí)需要返回多于10000條數(shù)據(jù),可以適當(dāng)修改max_result_window的值。以下示例將hotel索引的最大窗口值修改為20000:
PUT /hotel/_settings
{
"index":{
"max_result_window":20000 //設(shè)定搜索返回的文檔個(gè)數(shù)
}
}
注意,如果將配置修改得很大,一定要有足夠強(qiáng)大的硬件作為支撐。
作為一個(gè)分布式搜索引擎,一個(gè)ES索引的數(shù)據(jù)分布在多個(gè)分片中,而這些分片又分配在不同的節(jié)點(diǎn)上。一個(gè)帶有分頁的搜索請(qǐng)求往往會(huì)跨越多個(gè)分片,每個(gè)分片必須在內(nèi)存中構(gòu)建一個(gè)長度為from+size的、按照得分排序的有序隊(duì)列,用以存儲(chǔ)命中的文檔。然后這些分片對(duì)應(yīng)的隊(duì)列數(shù)據(jù)都會(huì)傳遞給協(xié)調(diào)節(jié)點(diǎn),協(xié)調(diào)節(jié)點(diǎn)將各個(gè)隊(duì)列的數(shù)據(jù)進(jìn)行匯總,需要提供一個(gè)長度為(分片總數(shù))*(from+size)的隊(duì)列用以進(jìn)行全局排序,然后再按照用戶的請(qǐng)求從from位置開始查找,找到size個(gè)文檔后進(jìn)行返回。
基于上述原理,ES不適合深翻頁。什么是深翻頁呢?簡(jiǎn)而言之就是請(qǐng)求的from值很大。假設(shè)在一個(gè)3個(gè)分片的索引中進(jìn)行搜索請(qǐng)求,參數(shù)from和size的值分別為1000和10,其響應(yīng)過程如下圖:
當(dāng)深翻頁的請(qǐng)求過多時(shí)會(huì)增加各個(gè)分片所在節(jié)點(diǎn)的內(nèi)存和CPU消耗。尤其是協(xié)調(diào)節(jié)點(diǎn),隨著頁碼的增加和并發(fā)請(qǐng)求的增多,該節(jié)點(diǎn)需要對(duì)這些請(qǐng)求涉及的分片數(shù)據(jù)進(jìn)行匯總和排序,過多的數(shù)據(jù)會(huì)導(dǎo)致協(xié)調(diào)節(jié)點(diǎn)資源耗盡而停止服務(wù)。
作為搜索引擎,ES更適合的場(chǎng)景是對(duì)數(shù)據(jù)進(jìn)行搜索,而不是進(jìn)行大規(guī)模的數(shù)據(jù)遍歷。一般情況下,只需要返回前1000條數(shù)據(jù)即可,沒有必要取到10000條數(shù)據(jù)。如果確實(shí)有大規(guī)模數(shù)據(jù)遍歷的需求,可以參考使用scroll模式或者考慮使用其他存儲(chǔ)引擎。
在Java客戶端中,可以調(diào)用SearchSourceBuilder的from和size()方法來設(shè)定from和size參數(shù)。這里,我是用一種平常開發(fā)設(shè)置分頁參數(shù)的一種方法,我們知道,類似mysql,我們都是通過offset,limit參數(shù)去控制從哪開始,查多少這樣一個(gè)場(chǎng)景,其實(shí)ES和這個(gè)是一樣的。我們可以建立一個(gè)共同的分頁接口Pageable并寫入獲取Offset和Limit這兩個(gè)參數(shù)的方法:
package com.mbw.request;
public interface Pagable {
int getOffset();
int getLimit();
boolean isAutoCount();
}
然后就是寫一個(gè)分頁條件類,因?yàn)榍岸艘话惴猪搮?shù)輸入的是pageNo和pageSize來控制分頁,熟悉分頁的應(yīng)該都了解,offset和limit可以通過這兩個(gè)參數(shù)計(jì)算獲取,下面是該條件類的主要代碼:
package com.mbw.request;
import java.io.Serializable;
/**
* 查詢條件對(duì)象基類
*/
public class PageCondition implements Serializable, Pagable {
private static final long serialVersionUID = 1L;
public static final int DEFAULT_PAGE_NO = 1;
public static final int DEFAULT_PAGE_SIZE = 10;
protected int pageNo = DEFAULT_PAGE_NO;
protected int pageSize = DEFAULT_PAGE_SIZE;
protected boolean autoCount = true;
public PageCondition() {
}
public PageCondition(int pageNo, int pageSize) {
this.pageNo = pageNo < 1 ? DEFAULT_PAGE_NO : pageNo;
this.pageSize = pageSize < 2 ? DEFAULT_PAGE_SIZE : pageSize;
}
public int getEnd(){
return getLimit()+getOffset();
}
@Override
public int getOffset() {
return (pageNo - 1) * pageSize;
}
@Override
public int getLimit() {
return pageSize;
}
public void setPageNo(int pageNo) {
this.pageNo = pageNo;
}
public void setPageSize(int pageSize) {
this.pageSize = pageSize;
}
/**
* 查詢對(duì)象時(shí)是否自動(dòng)另外執(zhí)行count查詢獲取總記錄數(shù), 默認(rèn)為false.
*/
@Override
public boolean isAutoCount() {
return autoCount;
}
/**
* 查詢對(duì)象時(shí)是否自動(dòng)另外執(zhí)行count查詢獲取總記錄數(shù).
*/
public void setAutoCount(final boolean autoCount) {
this.autoCount = autoCount;
}
public int getPageNo() {
return pageNo;
}
public int getPageSize() {
return pageSize;
}
}
這樣我們就可以通過pageNo和pageSize去控制offset和limit了,然后我們只需要調(diào)用SearchSourceBuilder的from和size方法即可,我們這邊沿用之前指定搜索返回字段的service接口:文章來源:http://www.zghlxwxcb.cn/news/detail-445901.html
public List<Hotel> queryBySource(HotelDocRequest hotelDocRequest) throws IOException {
String indexName = hotelDocRequest.getIndexName();
if (CharSequenceUtil.isBlank(indexName)) {
throw new SearchException("索引名不能為空");
}
Hotel hotel = hotelDocRequest.getHotel();
if (ObjectUtil.isEmpty(hotel)) {
throw new SearchException("搜索條件不能為空");
}
SearchRequest searchRequest = new SearchRequest(indexName);
String city = hotel.getCity();
//創(chuàng)建搜索builder
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
//構(gòu)建query
searchSourceBuilder.query(new TermQueryBuilder("city",city));
//設(shè)置分頁參數(shù)
searchSourceBuilder.from(hotelDocRequest.getOffset());
searchSourceBuilder.size(hotelDocRequest.getLimit());
//設(shè)定希望返回的字段數(shù)組
searchSourceBuilder.fetchSource(new String[]{"title","city"},null);
searchRequest.source(searchSourceBuilder);
ArrayList<Hotel> resultList = new ArrayList<>();
SearchResponse searchResponse = client.search(searchRequest, RequestOptions.DEFAULT);
RestStatus status = searchResponse.status();
if (status != RestStatus.OK) {
return Collections.emptyList();
}
SearchHits searchHits = searchResponse.getHits();
for (SearchHit searchHit : searchHits) {
Hotel hotelResult = new Hotel();
hotelResult.setId(searchHit.getId()); //文檔_id
hotelResult.setIndex(searchHit.getIndex()); //索引名稱
hotelResult.setScore(searchHit.getScore()); //文檔得分
//轉(zhuǎn)換為Map
Map<String, Object> dataMap = searchHit.getSourceAsMap();
hotelResult.setTitle((String) dataMap.get("title"));
hotelResult.setCity((String) dataMap.get("city"));
resultList.add(hotelResult);
}
return resultList;
}
那么如果我現(xiàn)在什么都不輸入,那么肯定會(huì)是用默認(rèn)值pageNo=1,pageSize=10,意味著Offset=0,limit=10.那這樣查出來肯定還是之前的2條,假設(shè)前端把pageSize改成1,那么postman調(diào)用應(yīng)該就只有第一條了:文章來源地址http://www.zghlxwxcb.cn/news/detail-445901.html
到了這里,關(guān)于Elasticsearch(八)搜索---搜索輔助功能(上)--指定搜索返回字段,結(jié)果計(jì)數(shù)和分頁的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!