elasticsearch
elasticsearch結(jié)合kibana、Logstash、Beats,也就是elastic stack (ELK)。被廣泛應(yīng)用在日志數(shù)據(jù)分析、實(shí)時(shí)監(jiān)控等領(lǐng)域。
什么是elasticsearch?
- 一個(gè)開(kāi)源的分布式搜索引擎,可以用來(lái)實(shí)現(xiàn)搜索、日志統(tǒng)計(jì)、分析系統(tǒng)監(jiān)控等功能
什么是elastic stack (ELK) ?
- 是以elasticsearch為核心的技術(shù)棧,包括beats、Logstash、kibana、elasticsearch
什么是Lucene?
- 是Apache的開(kāi)源搜索引擎類(lèi)庫(kù),提供了搜索引擎的核心API
正向索引跟倒排索引
什么是文檔和詞條?
- 每一條數(shù)據(jù)就是一個(gè)文檔
- 對(duì)文檔中的內(nèi)容分詞,得到的詞語(yǔ)就是詞條
什么是正向索引?
- 基于文檔id創(chuàng)建索引。查詢(xún)?cè)~條時(shí)必須先找到文檔,而后判斷是否包含詞條
什么是倒排索引?
- 對(duì)文檔內(nèi)容分詞,對(duì)詞條創(chuàng)建索引,并記錄詞條所在文檔的信息。查詢(xún)時(shí)先根據(jù)詞條查詢(xún)到文檔id,而后獲取到文檔
ES/Mysql區(qū)別
分詞器
ik分詞器
詳情可見(jiàn)
POST /_analyze
{
"text": ["馬化騰是一個(gè)人啊,奧力給啊額!"],
"analyzer": "ik_max_word"
}
pinyin分詞器
配置地址:https://github.com/medcl/elasticsearch-analysis-pinyin
POST /_analyze
{
"text": ["馬化騰是一個(gè)人啊,奧力給啊額!"],
"analyzer": "pinyin"
}
自定義分詞器
es中分詞器的組成包含三部分:
- character filters:在tokenizer之前對(duì)文本進(jìn)行處理。例如:刪除字符,替代字符
- tokenizer:將文本按照一定的規(guī)則切割成詞條 (term)。例如:keyword,就是不分詞;還有ik_smart
- tokenizer filter:將tokenizer輸出的詞條做進(jìn)一步處理。例如:大小寫(xiě)轉(zhuǎn)換、同義詞處理、拼音處理等
在創(chuàng)建索引庫(kù)的時(shí)候,通過(guò)settings
來(lái)配置自定義的分詞器
-
settings
:索引庫(kù)配置
settins可以指定三部分:
- character filter:特殊字符分詞器
- tokenizer:分詞器
- filter:拼音分詞器
不規(guī)定一定要都使用,視情況而定
mapping要指定 創(chuàng)建索引庫(kù)的分詞器 和 搜索分詞器
- “analyzer”: “myAnalyzer”,
- “search_analyzer”: “ik_max_word”
為什么要分開(kāi)指定?
因?yàn)槠匆舴衷~器在創(chuàng)建索引庫(kù)的時(shí)候使用,比如下面的獅子,柿子。創(chuàng)建的時(shí)候分為:shizi,sz,獅子跟柿子,因?yàn)槭褂昧似匆舴衷~器所以獅子跟柿子都有shizi,sz。用戶(hù)在搜索的時(shí)候如果使用了拼音分詞器:搜索=shizi,就會(huì)根據(jù)shizi在索引庫(kù)里找,找到柿子跟獅子。所以搜索的時(shí)候就不能帶著拼音分詞器,應(yīng)該使用ik分詞器,通過(guò)ik分詞器去索引庫(kù)里根據(jù)拼音分詞器查找
#自定義分詞器
PUT /test
{
"settings": {
"analysis": {
"analyzer": {
"myAnalyzer":{
"tokenizer":"ik_max_word",
"filter": "py" //指定拼音分詞器的名稱(chēng)
}
},
//拼音分詞器名稱(chēng)
"filter": {
"py":{
"type": "pinyin", //類(lèi)型
"keep_full_pinyin": false,//當(dāng)啟用這個(gè)選項(xiàng),如: 劉德華 >[ liu , de , hua ),默認(rèn)值:真的
"keep_joined_full_pinyin": true,//當(dāng)啟用此選項(xiàng)時(shí),例如: 劉德華 >[ liudehua ],默認(rèn):false
"keep_original": true,//當(dāng)啟用此選項(xiàng)時(shí),將保留原始輸入,默認(rèn)值:false
"limit_first_letter_length": 16,//set first_letter結(jié)果的最大長(zhǎng)度,默認(rèn)值:16
"remove_duplicated_term": true,//當(dāng)此選項(xiàng)啟用時(shí),重復(fù)項(xiàng)將被刪除以保存索引,例如: de的 > de ,默認(rèn):false,注:職位相關(guān)查詢(xún)可能會(huì)受到影響
"none_chinese_pinyin_tokenize" :false //非中國(guó)字母分解成單獨(dú)的拼音詞如果拼音,默認(rèn)值:true,如:liu , de , hua , a , li , ba , ba , 13 , zhuang , han ,注意: keep_none_chinese 和 keep_none_chinese_together 應(yīng)該啟用
}
}
}
},
"mappings": {
"properties": {
"name":{
"type": "text",
"analyzer": "myAnalyzer",
"search_analyzer": "ik_max_word"
}
}
}
}
索引庫(kù)
Mapping屬性
# 創(chuàng)建索引庫(kù)
PUT /firsttable
{
"mappings": {
"properties": {
"info": {
"type": "text",
"analyzer": "ik_max_word"
},
"age": {
"type": "integer"
},
"Weight": {
"type": "double"
},
"isMarried": {
"type": "boolean"
},
"email": {
"type": "keyword",
"index": false
},
"score": {
"type": "double"
},
"name": {
"type": "object",
"properties": {
"firstName": {
"type": "keyword"
},
"lastName": {
"type": "keyword"
}
}
}
}
}
}
# 查詢(xún)索引庫(kù)
GET /firsttable
# 修改索引庫(kù),不能改只能增加
PUT /firsttable/_mapping
{
"properties":{
"age2":{
"type": "double"
}
}
}
# 刪除
DELETE /firsttable
文檔
# 新增文檔
POST /firsttable/_doc/1
{
"info": "未婚男性",
"age": "20",
"Weight": "21.3",
"isMarried": false,
"email": "213@qq.com",
"score": "21.2",
"name": {
"firstName": "張",
"lastName": "三"
}
}
#查詢(xún)文檔
GET /firsttable/_doc/1
#刪除文檔
DELETE /firsttable/_doc/1
#修改文檔
#1.全量修改,會(huì)刪除舊文檔,添加新文檔
PUT /firsttable/_doc/1
{
"info": "未婚男性222",
"age": "20",
"Weight": "21.3",
"isMarried": false,
"email": "213@qq.com",
"score": "21.2",
"name": {
"firstName": "張",
"lastName": "三"
}
}
#2.局部修改
POST /firsttable/_update/1
{
"doc": {
"info": "未婚男性333"
}
}
RestClient操作
DSL語(yǔ)句
#hotel
PUT /hotel
{
"mappings":{
"properties":{
"id":{
"type": "keyword"
},
"name":{
"type": "text",
"analyzer": "ik_max_word"
},
"address":{
"type": "keyword",
"index": false,
"copy_to": "{all}"
},
"price":{
"type": "double"
},
"score":{
"type": "integer"
},
"brand":{
"type": "keyword",
"copy_to": "{all}"
},
"city":{
"type": "keyword",
"copy_to": "{all}"
},
"starName":{
"type": "keyword"
},
"business":{
"type": "keyword"
},
"location":{
"type": "geo_point"
},
"pic":{
"type": "keyword"
},
"all":{
"type": "text",
"analyzer": "ik_max_word"
}
}
}
}
- 引入依賴(lài)
<properties>
<java.version>1.8</java.version>
<elasticsearch.version>7.12.1</elasticsearch.version>
</properties>
<!-- es的javaRestLeveClient依賴(lài)-->
<dependency>
<groupId>org.elasticsearch.client</groupId>
<artifactId>elasticsearch-rest-high-level-client</artifactId>
</dependency>
- 初始化ResthighLevelClient
@SpringBootTest
class HotelDemoApplicationTests {
private RestHighLevelClient client;
@Test
void contextLoads() {
System.out.println(client);
}
@BeforeEach
void setUp(){
this.client = new RestHighLevelClient(
RestClient.builder(
HttpHost.create("http://192.168.163.129:9200")));
}
@AfterEach
void clear() throws IOException {
this.client.close();
}
}
索引庫(kù)操作
創(chuàng)建索引庫(kù)
@Test
void contextLoads() throws IOException {
// 1.創(chuàng)建request對(duì)象
CreateIndexRequest request = new CreateIndexRequest("hotel");
// 2.準(zhǔn)備dsl語(yǔ)句,MAPPING_HOTEL是String類(lèi)型的創(chuàng)建hotel的Dsl語(yǔ)句
request.source(MAPPING_HOTEL,XContentType.JSON);
// 3.發(fā)送請(qǐng)求,indices拿到的是操作索引庫(kù)的所有方法:put del post get
client.indices().create(request,RequestOptions.DEFAULT);
}
刪除索引庫(kù)
@Test
public void testDel() throws IOException {
DeleteIndexRequest hotel = new DeleteIndexRequest("hotel");
client.indices().delete(hotel,RequestOptions.DEFAULT);
}
判斷索引庫(kù)是否存在
@Test
public void testExists() throws IOException {
GetIndexRequest hotel = new GetIndexRequest("hotel");
System.out.println(client.indices().exists(hotel, RequestOptions.DEFAULT));
}
文檔操作
新增文檔
@Test
public void testAddData() throws IOException {
// 從數(shù)據(jù)庫(kù)里查出數(shù)據(jù)
Hotel hotel = hotelService.getById(61083L);
// 轉(zhuǎn)化成索引庫(kù)的結(jié)構(gòu)
HotelDoc hotelDoc = new HotelDoc(hotel);
// 封裝Dsl語(yǔ)句,根據(jù)索引庫(kù)名稱(chēng)跟id新增文檔
IndexRequest request = new IndexRequest("hotel").id(hotelDoc.getId().toString());
// 文檔數(shù)據(jù),JSON數(shù)據(jù)
request.source(JSON.toJSONString(hotelDoc), XContentType.JSON);
client.index(request, RequestOptions.DEFAULT);
}
查詢(xún)文檔
@Test
public void testGet() throws IOException {
GetRequest request = new GetRequest("hotel").id("61083");
GetResponse response = client.get(request, RequestOptions.DEFAULT);
String jsonStr = response.getSourceAsString();
HotelDoc hotelDoc = JSON.parseObject(jsonStr, HotelDoc.class);
System.out.println(hotelDoc);
}
刪除文檔
@Test
public void testDel() throws IOException {
DeleteRequest request = new DeleteRequest("hotel").id("61083");
DeleteResponse response = client.delete(request, RequestOptions.DEFAULT);
System.out.println(response.status());
}
修改文檔
@Test
public void testUpdate() throws IOException {
UpdateRequest request = new UpdateRequest("hotel","61083");
request.doc(
"score", "18",
"city", "東莞"
);
UpdateResponse response = client.update(request, RequestOptions.DEFAULT);
System.out.println(response.status());
}
批量新增文檔
@Test
public void testBulk() throws IOException {
QueryWrapper<Hotel> wrapper = new QueryWrapper<>();
// wrapper.last("limit 5");
List<Hotel> list = hotelService.list(wrapper);
BulkRequest request = new BulkRequest("hotel");
for (Hotel item: list){
HotelDoc hotelDoc = new HotelDoc(item);
request.add(
new IndexRequest("hotel")
.id(item.getId().toString())
.source(JSON.toJSONString(hotelDoc),XContentType.JSON));
}
client.bulk(request,RequestOptions.DEFAULT);
}
DSL查詢(xún)
查詢(xún)所有:查詢(xún)出所有數(shù)據(jù),一般測(cè)試用。例如:match_all
全文檢索(fulltext)查詢(xún):利用分詞器對(duì)用戶(hù)輸入內(nèi)容分詞,然后去倒排索引庫(kù)中匹配。例如:
match_query
multi_match_query
精確查詢(xún):根據(jù)精確詞條值查找數(shù)據(jù),一般是查找keyword、數(shù)值、日期、boolean等類(lèi)型字段。例如:
ids
range,根據(jù)值的范圍查詢(xún)
term,根據(jù)詞條精確值查詢(xún)
地理 (geo)查詢(xún)::根據(jù)經(jīng)緯度查詢(xún)。例如:
geo_distance
geo_bounding_box
復(fù)合(compound)查詢(xún)::復(fù)合查詢(xún)可以將上述各種查詢(xún)條件組合起來(lái),合并查詢(xún)條件。例如:
bool
function_score
查詢(xún)所有
GET /hotel/_search
{
"explain":true,# 查看分片所在位置
"query": {
"查詢(xún)類(lèi)型": {
"查詢(xún)條件": "條件值"
}
}
}
//查詢(xún)所有
GET /hotel/_search
{
"query": {
"match_all": {}
}
}
全文檢索
# match查詢(xún)
GET /hotel/_search
{
"query": {
"match": {
"all": "上海如家"
}
}
}
# multi_match查詢(xún),跟match查詢(xún)是有一點(diǎn)區(qū)別,match是匹配一個(gè)字段,但是multi_match是拿值去匹配規(guī)定的字段,如果match的all剛好是multi_match規(guī)定的字段,那這個(gè)時(shí)候match跟multi_match就是一樣的
GET /hotel/_search
{
"query": {
"multi_match": {
"query": "上海如家",
"fields": ["brand", "name", "address"]
}
}
}
精確查詢(xún)
# term 精確查詢(xún),根據(jù)詞條精確值查詢(xún)
GET /hotel/_search
{
"query": {
"term": {
"city": {
"value": "上海"
}
}
}
}
# range查詢(xún),根據(jù)值的范圍查詢(xún)
GET /hotel/_search
{
"query": {
"range": {
"price": {
"gte": 100,
"lte": 200
}
}
}
}
地理查詢(xún)
# distance查詢(xún),根據(jù)坐標(biāo)距離查詢(xún)
GET /hotel/_search
{
"query": {
"geo_distance": {
"distance": "3km",
"location": "31.21, 121.5"
}
}
}
# box查詢(xún),根據(jù)提供的坐標(biāo)作矩陣查詢(xún)
GET /hotel/_search
{
"query": {
"geo_bounding_box": {
"location":{
"top_left": {
"lat": 31.3,
"lon": 121.5
},
"bottom_right": {
"lat": 30.3,
"lon": 121.7
}
}
}
}
}
復(fù)合查詢(xún)
# function_score,查詢(xún)city=上海,定義brand=如家 的酒店權(quán)重=10,將查詢(xún)結(jié)果中匹配到的如家酒店的得分*10,其他酒店不變,而顯示的時(shí)候是根據(jù)得分排序的,所以如家酒店的排名就在前面
GET /hotel/_search
{
"query": {
"function_score": {
"query": {
"match": {
"city": "上海"
}
},
"functions": [
{
"filter": {
"term": {"brand": "如家"}
},
"weight":10
}
],
"boost_mode": "multiply"
}
}
}
boold查詢(xún)的邏輯關(guān)系:
- must:必須匹配的條件,可以理解為 ”與“
- should:選擇性匹配的條件,可以理解為 ”或“
- must_not:必須不匹配的條件,不參與打分,可以理解為 ”非“
- filter:必須匹配的條件,不參與打分
# bool查詢(xún),查詢(xún)名字是如家,價(jià)格低于400,距離31.21,121.5周?chē)?0km以?xún)?nèi)的酒店
# filter,must_not放在match外面是不參與算分的,只有放在match里面才會(huì)參與算分,但是參與算分性能會(huì)下降
GET /hotel/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "如家"
}
}
],
"must_not": [
{
"range": {
"price": {
"gt":400
}
}
}
],
"filter": [
{
"geo_distance": {
"distance": "10km",
"location": {
"lat": 31.21,
"lon": 121.5
}
}
}
]
}
}
}
相關(guān)性算法
排序
一旦開(kāi)啟了排序就不會(huì)再打分了
# sort排序查詢(xún),查詢(xún)brand=如家,按照得分降序,得分一樣按價(jià)格升序
GET /hotel/_search
{
"query": {
"match": {
"brand": "如家"
}
},
"sort": [
{
"score": {
"order": "desc"
},
"price": {
"order": "asc"
}
}
]
}
# sort查詢(xún),查詢(xún)坐標(biāo)附近的酒店按照升序排序,顯示單位為km
GET /hotel/_search
{
"query": {
"match": {
"brand": "如家"
}
},
"sort": [
{
"_geo_distance": {
"location": {
"lat": 31.240417 ,
"lon": 121.503134
},
"order": "asc",
"unit": "km"
}
}
]
}
分頁(yè)
ES默認(rèn)只返回top10的數(shù)據(jù),想要查詢(xún)到更多數(shù)據(jù)就需要修改分頁(yè)參數(shù)了。
ES通過(guò)修改from,size參數(shù)來(lái)控制要返回的分頁(yè)結(jié)果
ES受限于倒排索引,每次分頁(yè)查詢(xún)都是查出全部數(shù)據(jù),然后截取數(shù)據(jù),比如查詢(xún)990-1000的數(shù)據(jù),就需要查詢(xún)出1000條數(shù)據(jù),截取出最后10條數(shù)據(jù)
ES是支持分布式的,為了盡可能多的存儲(chǔ)數(shù)據(jù)肯定會(huì)采用分布式ES,而每個(gè)分片都會(huì)有自己的數(shù)據(jù),那么如果使用分頁(yè)查詢(xún)990-1000的數(shù)據(jù)咋辦,是不是要每個(gè)分片都查詢(xún)自己的前1000條數(shù)據(jù),那如何判斷哪些數(shù)據(jù)是拿來(lái)用的?比如10個(gè)分片每個(gè)分片查詢(xún)1000條數(shù)據(jù),取后10條數(shù)據(jù),那也有100條數(shù)據(jù),怎么辦?實(shí)際上ES會(huì)將這十個(gè)分片的總記錄合并起來(lái),即1w條記錄數(shù),重新排序1000條數(shù)據(jù),取990-1000
# 分頁(yè)查詢(xún)
# sort查詢(xún)
GET /hotel/_search
{
"query": {
"match": {
"brand": "如家"
}
},
"sort": [
{
"score": {
"order": "desc"
},
"price": {
"order": "asc"
}
}
],
"from": 0,
"size": 2
}
深度分頁(yè)問(wèn)題
深度分頁(yè)解決方案
建議使用search after:
- 優(yōu)點(diǎn):沒(méi)有查詢(xún)上限(單詞查詢(xún)的size不超過(guò)1w)
- 缺點(diǎn):只能向后主頁(yè)查詢(xún),不支持隨機(jī)翻頁(yè)
- 場(chǎng)景:沒(méi)有隨機(jī)翻頁(yè)需求的搜索,例如:手機(jī)向下滾動(dòng)翻頁(yè)
高亮顯示
這里查詢(xún)的是all,而all是由字段copy_to來(lái)的,但是fields中高亮字段的是name,ES默認(rèn)采用的是查詢(xún)字段跟高亮字段一致,可以使用require_field_match
修改配置
# 高亮查詢(xún)
GET /hotel/_search
{
"query": {
"match": {
"all": "上海如家"
}
},
"highlight": {
"fields": {
"name": {
"require_field_match": "false"
}
}
}
}
RestClient查詢(xún)操作
查詢(xún)所有matchAll
@Test
public void testMatchALl() throws IOException {
SearchRequest request = new SearchRequest("hotel");
request.source().query(QueryBuilders.matchAllQuery());
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
// 解析響應(yīng)結(jié)果文檔,獲取hits
SearchHits searchHits = response.getHits();
// 獲取記錄總條數(shù)
long value = searchHits.getTotalHits().value;
System.err.println("<=====共有條"+value+"數(shù)據(jù)====>");
// 獲取hits里的文檔數(shù)組
SearchHit[] hits = searchHits.getHits();
for (SearchHit hit : hits) {
String jsonStr = hit.getSourceAsString();
HotelDoc hotelDoc = JSONObject.parseObject(jsonStr, HotelDoc.class);
System.err.println("hotelDoc---> " + hotelDoc);
}
}
//查詢(xún)所有
GET /hotel/_search
{
"query": {
"match_all": {}
}
}
全文檢索
/**
* 全文檢索match
* @throws IOException
*/
@Test
public void testMatch() throws IOException {
SearchRequest request = new SearchRequest("hotel");
request.source().query(QueryBuilders.matchQuery("all","上海如家"));
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
}
GET /hotel/_search
{
"query": {
"match": {
"all": "上海如家"
}
}
}
/**
* 全文檢索multiMatch
* @throws IOException
*/
@Test
public void testMultiMatch() throws IOException {
SearchRequest request = new SearchRequest("hotel");
request.source().query(QueryBuilders.multiMatchQuery("上海如家","brand","name","address"));
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
}
GET /hotel/_search
{
"query": {
"multi_match": {
"query": "上海如家",
"fields": ["brand", "name", "address"]
}
}
}
精確查詢(xún)
/**
* 精確查詢(xún)term
* @throws IOException
*/
@Test
public void testTerm() throws IOException {
SearchRequest request = new SearchRequest("hotel");
request.source().query(QueryBuilders.termQuery("city","上海"));
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
}
GET /hotel/_search
{
"query": {
"term": {
"city": {
"value": "上海"
}
}
}
}
/**
* 范圍查詢(xún)r(jià)ange
* @throws IOException
*/
@Test
public void testRange() throws IOException {
SearchRequest request = new SearchRequest("hotel");
request.source().query(QueryBuilders.rangeQuery("price").gte(100).lte(200));
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
}
GET /hotel/_search
{
"query": {
"range": {
"price": {
"gte": 100,
"lte": 200
}
}
}
}
地理查詢(xún)
/**
* 地理查詢(xún)Distance
* @throws IOException
*/
@Test
public void testDistance() throws IOException {
SearchRequest request = new SearchRequest("hotel");
request.source().query(QueryBuilders.geoDistanceQuery("location").distance("3km").point(31.21,121.5));
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
}
# distance查詢(xún),根據(jù)坐標(biāo)距離查詢(xún)
GET /hotel/_search
{
"query": {
"geo_distance": {
"distance": "3km",
"location": "31.21, 121.5"
}
}
}
復(fù)合查詢(xún)
/**
* 組合查詢(xún)bool
* @throws IOException
*/
@Test
public void testBool() throws IOException {
SearchRequest request = new SearchRequest("hotel");
// 準(zhǔn)備DSL
// 準(zhǔn)備BoolQueryBuilder
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
// 添加must
boolQuery.must(QueryBuilders.matchQuery("name","如家"));
// 添加mustNot
boolQuery.mustNot(QueryBuilders.rangeQuery("price").gt("400"));
// 添加filter
boolQuery.filter(QueryBuilders.geoDistanceQuery("location").distance("10km").point( 31.21,121.5));
request.source().query(boolQuery);
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
}
GET /hotel/_search
{
"query": {
"bool": {
"must": [
{
"match": {
"name": "如家"
}
}
],
"must_not": [
{
"range": {
"price": {
"gt":400
}
}
}
],
"filter": [
{
"geo_distance": {
"distance": "10km",
"location": {
"lat": 31.21,
"lon": 121.5
}
}
}
]
}
}
}
/**
* 復(fù)合查詢(xún)FunctionScore
* @throws IOException
*/
@Test
public void testFunctionScore() throws IOException {
SearchRequest request = new SearchRequest("hotel");
SearchSourceBuilder searchSourceBuilder = new SearchSourceBuilder();
// 創(chuàng)建match語(yǔ)句
QueryBuilder queryBuilder = QueryBuilders.matchQuery("city", "上海");
// 創(chuàng)建function語(yǔ)句
FunctionScoreQueryBuilder.FilterFunctionBuilder[] filterFunctionBuilders = {
new FunctionScoreQueryBuilder.FilterFunctionBuilder(
QueryBuilders.termQuery("brand", "如家"),
new WeightBuilder().setWeight(10)
)
};
// 把function跟query放到一個(gè)functionScoreQuery里
FunctionScoreQueryBuilder functionScoreQueryBuilder = QueryBuilders.functionScoreQuery(queryBuilder, filterFunctionBuilders);
searchSourceBuilder.query(functionScoreQueryBuilder);
request.source(searchSourceBuilder);
SearchResponse response = client.search(request,RequestOptions.DEFAULT);
}
GET /hotel/_search
{
"query": {
"function_score": {
"query": {
"match": {
"city": "上海"
}
},
"functions": [
{
"filter": {
"term": {"brand": "如家"}
},
"weight":10
}
],
"boost_mode": "multiply"
}
}
}
排序
/**
* 排序sort和分頁(yè)
* @throws IOException
*/
@Test
public void testSort() throws IOException {
SearchRequest request = new SearchRequest("hotel");
MatchQueryBuilder query = QueryBuilders.matchQuery("brand", "如家");
// 兩個(gè)排序
FieldSortBuilder score = SortBuilders.fieldSort("score").order(SortOrder.DESC);
FieldSortBuilder price = SortBuilders.fieldSort("price").order(SortOrder.ASC);
// 把兩個(gè)排序放到一個(gè)sort里
List<SortBuilder<?>> builders = new ArrayList<>();
builders.add(score);
builders.add(price);
request.source().sort(builders);
request.source().query(query);
request.source().from(0);
request.source().size(2);
SearchResponse response = client.search(request,RequestOptions.DEFAULT);
SearchHits searchHits = response.getHits();
System.out.println(searchHits.getTotalHits());
}
GET /hotel/_search
{
"query": {
"match": {
"brand": "如家"
}
},
"sort": [
{
"score": {
"order": "desc"
},
"price": {
"order": "asc"
}
}
],
"from": 0,
"size": 2
}
高亮
/**
* 高亮
* @throws IOException
*/
@Test
public void testHighLight() throws IOException {
SearchRequest request = new SearchRequest("hotel");
MatchQueryBuilder query = QueryBuilders.matchQuery("all", "如家");
HighlightBuilder highlightBuilder = new HighlightBuilder().field("name").requireFieldMatch(false);
request.source().highlighter(highlightBuilder);
request.source().query(query);
SearchResponse response = client.search(request,RequestOptions.DEFAULT);
SearchHits searchHits = response.getHits();
System.out.println(searchHits.getTotalHits());
SearchHit[] hits = searchHits.getHits();
for (SearchHit hit : hits) {
String jsonStr = hit.getSourceAsString();
HotelDoc hotelDoc = JSONObject.parseObject(jsonStr, HotelDoc.class);
Map<String, HighlightField> highlightFields = hit.getHighlightFields();
HighlightField highlightField = highlightFields.get("name");
if (highlightField!=null){
String name = highlightField.getFragments()[0].string();
hotelDoc.setName(name);
}
System.out.println(hotelDoc);
}
}
GET /hotel/_search
{
"query": {
"match": {
"all": "上海如家"
}
},
"highlight": {
"fields": {
"name": {
"require_field_match": "false"
}
}
}
}
數(shù)據(jù)聚合
聚合可以實(shí)現(xiàn)對(duì)文檔數(shù)據(jù)的統(tǒng)計(jì),分析,運(yùn)算,常見(jiàn)聚合有:
- 桶(Bucket)聚合:用來(lái)對(duì)文檔做分組
- TermAggregation:按照文檔字段子分組
- Date Histogram:按照日期接替分組,例如:一周一組,一月一組
- 度量(Metric)聚合:計(jì)算值
- AVG:求平均值
- Max:求最大值
- Min:求最小值
- Stats:同時(shí)求:max,min,avg,sum等
- 管道(pipeline)聚合:其他聚合的結(jié)果為基礎(chǔ)做聚合
參加聚合的字段類(lèi)型必須是不能分詞的:keyword,數(shù)值,日志,布爾
Bucket桶
默認(rèn)情況下,Bucket聚合會(huì)統(tǒng)計(jì)Bucket內(nèi)的文檔數(shù)量,記為:_count
,并且按照_count
降序排序
默認(rèn)情況下,Bucket聚合是對(duì)索引庫(kù)的所有文檔做聚合,可以限定要聚合的文檔范圍,只要添加query條件即可
聚合三要素:
- 聚合名稱(chēng)
- 聚合類(lèi)型
- 聚合字段
聚合配置屬性:
- size:聚合結(jié)果數(shù)量
- order:聚合結(jié)果排序方式
- field:聚合字段
# bucket聚合
GET /hotel/_search
{
#限制聚合文檔的范圍
"query": {
"range": {
"price": {
"gte": 200,
"lte": 1000
}
}
},
"size": 1,
"aggs": {
"demo": {
"terms": {
"field": "brand",
# 修改排序方式
"order": {
"_count": "asc"
},
"size": 20
}
}
}
}
Metrics聚合
# Metrics聚合
GET /hotel/_search
{
"size": 0,
"aggs": {
#主聚合,聚合名稱(chēng):demo,聚合類(lèi)型是terms,聚合字段是brand,按照子聚合metricsAgg.avg的結(jié)果降序排序,顯示20個(gè)結(jié)果
"demo": {
"terms": {
"field": "brand",
"order": {
"metricsAgg.avg": "desc"
}
"size": 20
},
#子聚合,在上面的聚合結(jié)果基礎(chǔ)上,繼續(xù)聚合,聚合名稱(chēng)是metricsAgg,聚合類(lèi)型是stats,對(duì)score字段聚合
#求每個(gè)品牌的得分情況,min/max/avg/sum
"aggs": {
"metricsAgg": {
"stats": {
"field": "score"
}
}
}
}
}
}
自動(dòng)補(bǔ)全
拼音分詞
elasticsearch提供了ompletion Suggester查詢(xún)來(lái)實(shí)現(xiàn)自動(dòng)補(bǔ)全功能。這個(gè)查詢(xún)會(huì)匹配以用戶(hù)輸入內(nèi)容開(kāi)頭的詞條并返回。為了提高補(bǔ)全查詢(xún)的效率,對(duì)于文檔中字段的類(lèi)型有一些約束:
- 參與補(bǔ)全查詢(xún)的字段必須是completion類(lèi)型
#創(chuàng)建索引庫(kù)
PUT /test2
{
"mappings":{
"properties": {
"title": {
"type": "completion"
}
}
}
}
POST /test2/_doc/1
{
"title":["Sony","WH1000"],
"id":1
}
POST /test2/_doc/2
{
"title":["SKny","PH1000"],
"id":1
}
POST /test2/_doc/3
{
"title":["Nony","sH1000"],
"id":1
}
#自動(dòng)補(bǔ)全
GET /test2/_search
{
"suggest": {
"mySuggest": {
"text": "so",
"completion": {
"field": "title",
"skip_duplicates": true,
"size":10
}
}
}
}
#hotel
PUT /hotel
{
"mappings":{
"properties":{
"id":{
"type": "keyword"
},
"address":{
"type": "keyword",
"copy_to": "all"
},
"price":{
"type": "double"
},
"score":{
"type": "integer"
},
"brand":{
"type": "keyword",
"copy_to": "all"
},
"city":{
"type": "keyword",
"copy_to": "all"
},
"starName":{
"type": "keyword"
},
"business":{
"type": "keyword"
},
"location":{
"type": "geo_point"
},
"pic":{
"type": "keyword"
},
"name":{
"type": "text",
"analyzer": "text_analyzere",
"search_analyzer": "ik_smart",
"copy_to": "all"
},
#all是搜索字段,添加文檔的時(shí)候采用text_analyzere,最大粒度分詞和拼音分詞,搜索的時(shí)候就采用最大粒度搜索,根據(jù)用戶(hù)的輸入逐個(gè)拆分
"all":{
"type": "text",
"analyzer": "text_analyzere",
"search_analyzer": "ik_max_word"
},
#額外添加的字段,用來(lái)專(zhuān)門(mén)處理自動(dòng)補(bǔ)全的,類(lèi)型是completion,在新增文檔的時(shí)候,從數(shù)據(jù)庫(kù)中查詢(xún)的數(shù)據(jù),就已經(jīng)把需要的數(shù)據(jù)放到suggestion這個(gè)字段里了,是個(gè)數(shù)組
"suggestion":{
"type": "completion",
"analyzer": "completion_analyzere"
}
}
},
"settings": {
"analysis": {
"analyzer": {
"text_analyzere":{
"tokenizer":"ik_max_word",
"filter":"py"
},
"completion_analyzere":{
"tokenizer":"keyword",
"filter":"py"
}
},
"filter": {
"py":{
"type": "pinyin",
"keep_full_pinyin": false,
"keep_joined_full_pinyin": true,
"keep_original": true,
"limit_first_letter_length": 16,
"remove_duplicated_term": true,
"none_chinese_pinyin_tokenize" :false
}
}
}
}
}
RestClient操作
/**
* 自動(dòng)補(bǔ)全查詢(xún)
*/
@Test
public void testSuggestion() {
try {
SearchRequest request = new SearchRequest("hotel");
request.source().suggest(new SuggestBuilder()
.addSuggestion("mySuggestion",
SuggestBuilders
.completionSuggestion("suggestion")
.prefix("s")
.skipDuplicates(true)
.size(10)));
SearchResponse response = client.search(request, RequestOptions.DEFAULT);
CompletionSuggestion mySuggestion = response.getSuggest().getSuggestion("mySuggestion");
List<CompletionSuggestion.Entry.Option> list = mySuggestion.getOptions();
for (CompletionSuggestion.Entry.Option option : list) {
System.err.println(option.getText().string());
}
} catch (IOException e) {
System.out.println(e);
}
}
@Data
@NoArgsConstructor
public class HotelDoc {
省略
private Object distance;
private Boolean isAD;
private List<String> suggestion;
public HotelDoc(Hotel hotel) {
省略
if (this.business.contains("、")){
String[] arr = this.business.split("、");
this.suggestion = new ArrayList<>();
this.suggestion.add(this.brand);
Collections.addAll(this.suggestion,arr);
}else if (this.business.contains("/")){
String[] arr = this.business.split("/");
this.suggestion = new ArrayList<>();
this.suggestion.add(this.brand);
Collections.addAll(this.suggestion,arr);
}else {
this.suggestion = Arrays.asList(this.brand,this.business);
}
}
}
數(shù)據(jù)同步
elasticsearch中的酒店數(shù)據(jù)來(lái)自于mysql數(shù)據(jù)庫(kù),因此mysql數(shù)據(jù)發(fā)生改變時(shí),elasticsearch也必須跟著改變,這個(gè)就是elasticsearch與mysql之間的數(shù)據(jù)同步
異步通知
監(jiān)聽(tīng)binlog
同步調(diào)用:
- 優(yōu)點(diǎn):實(shí)現(xiàn)簡(jiǎn)單,粗暴
- 缺點(diǎn):業(yè)務(wù)耦合度高
異步通知:
- 優(yōu)點(diǎn):低耦合,實(shí)現(xiàn)難度一般
- 缺點(diǎn):依賴(lài)mg的可靠性
監(jiān)聽(tīng)binlog:
- 優(yōu)點(diǎn):完全解除服務(wù)間耦合
- 缺點(diǎn):開(kāi)啟binlog增加數(shù)據(jù)庫(kù)負(fù)擔(dān)、實(shí)現(xiàn)復(fù)雜度高
ES集群
ES集群腦裂
master eligible節(jié)點(diǎn)的作用是什么?
- 參與集群選主
- 主節(jié)點(diǎn)可以管理集群狀態(tài)、管理分片信息、處理創(chuàng)建和刪除索引庫(kù)的請(qǐng)求
data節(jié)點(diǎn)的作用是什么?
- 數(shù)據(jù)的CRUD
coordinator節(jié)點(diǎn)的作用是什么?
- 路由請(qǐng)求到其它節(jié)點(diǎn)
- 合并查詢(xún)到的結(jié)果,返回給用戶(hù)
ES集群的分布式存儲(chǔ)
ES集群的分布式查詢(xún)
分布式新增如何確定分片?
- coordinating node根據(jù)id做hash運(yùn)算,得到結(jié)果對(duì)shard數(shù)量取余,余數(shù)就是對(duì)應(yīng)的分片
分布式查詢(xún)文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-410355.html
- 分散階段:coordinating node將查詢(xún)請(qǐng)求分發(fā)給不同分片
- 收集階段:將查詢(xún)結(jié)果匯總到coordinating node ,整理并返回給用戶(hù)
故障轉(zhuǎn)移
文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-410355.html
到了這里,關(guān)于Elasticsearch從結(jié)構(gòu)到集群一站式學(xué)習(xí)的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!