去重實(shí)現(xiàn)原理: 采用es 的Collapse折疊+cardinality基數(shù)計(jì)算 實(shí)現(xiàn)去重
1、優(yōu)點(diǎn):簡(jiǎn)單快速效率高,幾乎無(wú)性能損耗(相比于分桶去重)
2、缺點(diǎn):
1)Collapse折疊只支持一個(gè)字段去重,且字段必須是 keyword
2)cardinality基數(shù)計(jì)算去重后數(shù)量 (采用hyperloglog實(shí)現(xiàn),hyperloglog一種近似計(jì)算)會(huì)有誤差,
3) 不支持search_after 和 scroll,有深度分頁(yè)問(wèn)題
1、單個(gè)字段去重文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-734911.html
GET /xxxxx/_search
{ "_source": [ //查詢顯示的字段
"title",
"uuid",
"id"
],
"query": {
"match_phrase": {
"title": "去重查詢"
}
},
"sort": [
{
"id.keyword": {
"order": "desc"
}
}
],
"collapse": {
"field": "uuid.keyword",//折疊字段,即 去重字段(折疊只支持一個(gè)字段)
"inner_hits": { //內(nèi)部查詢可以省略
"name": "inner_tops",//內(nèi)部查詢的名字
"size": 2, //折疊項(xiàng) 取幾個(gè)
"sort": [ //折疊 內(nèi)部排序字段
{
"id.keyword": "desc"
}
],
"collapse": { "field": "title.keyword" },//二次折疊
"_source": [ //內(nèi)部查詢顯示的字段
"title",
"uuid",
"id"
]
}
},
"aggs": {
"total_size": { //聚合桶名稱
"cardinality": { //折疊 total 數(shù)量不發(fā)生變化,采用cardinality 獲取總數(shù)量
"field": "uuid.keyword"
"precision_threshold": 100 //精確度,0-40000
}
}
},
"track_total_hits":true //返回所有條數(shù)結(jié)果,默認(rèn)返回10000
}
java API
//折疊和cardinality 去重 查詢
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
AggregationBuilder aggregation = AggregationBuilders.cardinality(“total_size”).field("uuid.keyword");
searchSourceBuilder.aggregation(aggregation);
Aggregations aggregations = searchResponse.getAggregations();
Cardinality cardinality = aggregations.get(“total_size”);
System.out.println(cardinality.getValue())
Elasticsearch 8.0+ 新版api
BoolQuery.Builder boolQuery = new BoolQuery.Builder();
SearchResponse<JSONObject> search = elasticsearchClient.search(builder ->
builder.index(EsIndexConstants.INDEX_NAME).query(q ->
q.bool(boolQuery.build()))
.from(start)
.size(requestInfo.getPageSize())
.collapse(new FieldCollapse.Builder().field("uuid.keyword").build())
.sort(s -> s.field(f -> f.field("publish_time").order(SortOrder.Desc))
//
.aggregations("total_size", a-> a.cardinality(b->b.field("uuid.keyword").precisionThreshold(100))
)
.trackTotalHits(t -> t.enabled(true))
, JSONObject.class);
//獲取折疊后的總數(shù)
long totalSize=search.aggregations().get("total_size").cardinality().value();
2、多個(gè)字段去重
將多個(gè)字段組合成 一個(gè)字段然后去重
多個(gè)字段組合成一個(gè)字段有3種方案
1)寫入的時(shí)候組合
2)采用pinpline
3)采用script 腳本文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-734911.html
到了這里,關(guān)于【elasticsearch】ES去重查詢實(shí)現(xiàn)的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!