數(shù)據(jù)聚合
一、聚合的種類
官方文檔 => 聚合 https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations.html
聚合:對(duì)文檔信息的統(tǒng)計(jì)、分類、運(yùn)算。類似mysql sum、avg、count
- 桶(Bucket)聚合:用來(lái)對(duì)文檔做分組
- TermAggregation:按照文檔字段值分組(相當(dāng)于mysql group by)
- Date Histogram:按照日期階梯分組,例如一周一組,一月一組
- 度量(metric)聚合:用來(lái)計(jì)算一些值,最大值、平均值、最小值等。
- Avg:平均值
- Max:最大值
- Min:最小值
- Stats:同時(shí)求max、min、avg、sum等
- 管道(pipeline)聚合:以其他聚合結(jié)果為基礎(chǔ)繼續(xù)做集合
二、DSL實(shí)現(xiàn)聚合
1、Bucket(桶)聚合
_count:默認(rèn)是按照文檔數(shù)量的降序排序
GET /hotel/_search
{
"size": 0,
"aggs": {
"brandAgg": {
"terms": {
"field": "brand",
"size": 20,
"order": {
"_count": "asc"
}
}
}
}
}
上面使用的bucket聚合,會(huì)掃描索引庫(kù)所有的文檔進(jìn)行聚合??梢韵拗茠呙璧姆秶豪胵uery條件即可。
GET /hotel/_search
{
"query": {
"range": {
"price": {
"lt": 200 # 只對(duì)價(jià)位低于200的聚合
}
}
},
"size": 0,
"aggs": {
"brandAgg": {
"terms": {
"field": "brand",
"size": 20,
"order": {
"_count": "asc"
}
}
}
}
}
2、Metrics(度量)聚合
聚合的嵌套,先對(duì)外層進(jìn)行聚合,在對(duì)內(nèi)存進(jìn)行聚合
注意嵌套查詢:寫在外層查詢括號(hào)內(nèi),而非并立。
GET /hotel/_search
{
"size": 0,
"aggs": {
"brandAgg": {
"terms": {
"field": "brand",
"size": 10,
"order": {
"scoreAgg.avg": "asc"
}
},
"aggs": {
"scoreAgg": {
"stats": {
"field": "score"
}
}
}
}
}
}
三、RestAPI實(shí)現(xiàn)聚合
bucket trem聚合(group by),實(shí)現(xiàn)品牌、星級(jí)、城市聚合的方法
public Map<String, List<String>> filters(RequestParam requestParam) {
String[] aggNames = new String[]{"brand","city","starName"};
Map<String, List<String>> resultMap = new HashMap<>();
SearchRequest searchRequest = new SearchRequest("hotel");
// 限定聚合范圍
BoolQueryBuilder boolQueryBuilder = getBoolQueryBuilder(requestParam);
searchRequest.source().query(boolQueryBuilder);
// 聚合字段
searchRequest.source().size(0);
searchRequest.source().aggregation(AggregationBuilders.terms(aggNames[0]).field("brand").size(100));
searchRequest.source().aggregation(AggregationBuilders.terms(aggNames[1]).field("city").size(100));
searchRequest.source().aggregation(AggregationBuilders.terms(aggNames[2]).field("starName").size(100));
try {
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
Aggregations aggregations = searchResponse.getAggregations();
for (String aggName : aggNames) {
Terms terms = aggregations.get(aggName);
List<String> list = new ArrayList<>();
for (Terms.Bucket bucket : terms.getBuckets()) {
list.add(bucket.getKeyAsString());
}
resultMap.put(aggName,list);
}
return resultMap;
} catch (IOException e) {
e.printStackTrace();
return null;
}
}
自動(dòng)補(bǔ)全
一、拼音分詞器
下載拼音分詞器:https://github.com/medcl/elasticsearch-analysis-pinyin/releases/tag/v8.6.0
解壓放在plugins目錄下(docker掛載的目錄),然后重啟es
二、自定義分詞器
拼音分詞器的過(guò)濾規(guī)則,參照上面下載的鏈接。
創(chuàng)建一個(gè)自定義分詞器(text index庫(kù)),分詞器名:my_analyzer
// 自定義拼音分詞器 + mapping約束
PUT /test
{
"settings": {
"analysis": {
"analyzer": {
"my_analyzer": {
"tokenizer": "ik_max_word",
"filter": "py"
}
},
"filter": {
"py": {
"type": "pinyin",
"keep_full_pinyin": false,
"keep_joined_full_pinyin": true,
"keep_original": true,
"limit_first_letter_length": 16,
"remove_duplicated_term": true,
"none_chinese_pinyin_tokenize": false
}
}
}
},
"mappings": {
"properties": {
"name": {
"type": "text",
"analyzer": "my_analyzer",
"search_analyzer": "ik_smart"
}
}
}
}
三、自動(dòng)補(bǔ)全查詢
completion suggester查詢:
- 字段類型必須是completion
- 字段值是多詞條的數(shù)組才有意義
// 自動(dòng)補(bǔ)全的索引庫(kù)
PUT test2
{
"mappings": {
"properties": {
"title":{
"type": "completion"
}
}
}
}
// 示例數(shù)據(jù)
POST test2/_doc
{
"title": ["Sony", "WH-1000XM3"]
}
POST test2/_doc
{
"title": ["SK-II", "PITERA"]
}
POST test2/_doc
{
"title": ["Nintendo", "switch"]
}
// 自動(dòng)補(bǔ)全查詢
POST /test2/_search
{
"suggest": {
"title_suggest": {
"text": "s", // 關(guān)鍵字
"completion": {
"field": "title", // 補(bǔ)全字段
"skip_duplicates": true, // 跳過(guò)重復(fù)的
"size": 10 // 獲取前10條結(jié)果
}
}
}
}
四、實(shí)現(xiàn)搜索款自動(dòng)補(bǔ)全(例酒店信息)
在這里插入代碼片
構(gòu)建索引庫(kù)
// 酒店數(shù)據(jù)索引庫(kù)
PUT /hotel
{
"settings": {
"analysis": {
"analyzer": {
"text_anlyzer": {
"tokenizer": "ik_max_word",
"filter": "py"
},
"completion_analyzer": {
"tokenizer": "keyword",
"filter": "py"
}
},
"filter": {
"py": {
"type": "pinyin",
"keep_full_pinyin": false,
"keep_joined_full_pinyin": true,
"keep_original": true,
"limit_first_letter_length": 16,
"remove_duplicated_term": true,
"none_chinese_pinyin_tokenize": false
}
}
}
},
"mappings": {
"properties": {
"id":{
"type": "keyword"
},
"name":{
"type": "text",
"analyzer": "text_anlyzer",
"search_analyzer": "ik_smart",
"copy_to": "all"
},
"address":{
"type": "keyword",
"index": false
},
"price":{
"type": "integer"
},
"score":{
"type": "integer"
},
"brand":{
"type": "keyword",
"copy_to": "all"
},
"city":{
"type": "keyword"
},
"starName":{
"type": "keyword"
},
"business":{
"type": "keyword",
"copy_to": "all"
},
"location":{
"type": "geo_point"
},
"pic":{
"type": "keyword",
"index": false
},
"all":{
"type": "text",
"analyzer": "text_anlyzer",
"search_analyzer": "ik_smart"
},
"suggestion":{
"type": "completion",
"analyzer": "completion_analyzer"
}
}
}
}
查詢測(cè)試
GET /hotel/_search
{
"query": {"match_all": {}}
}
GET /hotel/_search
{
"suggest": {
"YOUR_SUGGESTION": {
"text": "s",
"completion": {
"field": "suggestion",
"skip_duplicates": true // 跳過(guò)重復(fù)的
}
}
}
}
public List<String> getSuggestion(String prefix) {
SearchRequest request = new SearchRequest("hotel");
ArrayList<String> list = new ArrayList<>();
try {
request.source().suggest(new SuggestBuilder().addSuggestion(
"OneSuggestion",
SuggestBuilders
.completionSuggestion("suggestion")
.prefix(prefix)
.skipDuplicates(true)
.size(10)
));
SearchResponse response = restHighLevelClient.search(request, RequestOptions.DEFAULT);
Suggest suggest = response.getSuggest();
CompletionSuggestion oneSuggestion = suggest.getSuggestion("OneSuggestion");
List<CompletionSuggestion.Entry.Option> options = oneSuggestion.getOptions();
for (CompletionSuggestion.Entry.Option option : options) {
String text = option.getText().toString();
list.add(text);
}
} catch (IOException e) {
e.printStackTrace();
}
return list;
}
數(shù)據(jù)同步
雙寫一致性
同步調(diào)用數(shù)據(jù)耦合,業(yè)務(wù)耦合文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-660851.html
異步通知:增加實(shí)現(xiàn)難度
監(jiān)聽(tīng)binlog(記錄增刪改操作):增加mysql壓力,中間價(jià)搭建文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-660851.html
到了這里,關(guān)于ElasticSearch 數(shù)據(jù)聚合、自動(dòng)補(bǔ)全(自定義分詞器)、數(shù)據(jù)同步的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!