Elasticsearch 全文檢索分詞檢索-Elasticsearch文章四

這篇具有很好參考價(jià)值的文章主要介紹了Elasticsearch 全文檢索分詞檢索-Elasticsearch文章四。希望對(duì)大家有所幫助。如果存在錯(cuò)誤或未考慮完全的地方，請(qǐng)大家不吝賜教，您也可以點(diǎn)擊"舉報(bào)違法"按鈕提交疑問。

官方文檔地址

https://www.elastic.co/guide/en/enterprise-search/current/start.html

refercence文檔

https://www.elastic.co/guide/en/elasticsearch/reference/7.17/query-dsl-match-query.html

全文搜索體系

Full text Query中，我們只需要把如下的那么多點(diǎn)分為3大類，你的體系能力會(huì)大大提升

Elasticsearch 全文檢索分詞檢索-Elasticsearch文章四,AI大數(shù)據(jù),分布式微服務(wù),全文檢索,elasticsearch,分詞

很多api都可以查得到，我們只要大概知道有支持哪些功能

Elasticsearch 全文檢索分詞檢索-Elasticsearch文章四,AI大數(shù)據(jù),分布式微服務(wù),全文檢索,elasticsearch,分詞

match

簡(jiǎn)單查詢

GET visit_log/_search
{
  "query": { "match": {
    "serverHostName": "wei"
  }},
  "sort": [
    { "_id": "asc" }
  ],
  "from": 0,
  "size": 10
}

Elasticsearch 執(zhí)行上面這個(gè) match 查詢的步驟是：
1. 檢查字段類型。
標(biāo)題 title 字段是一個(gè) string 類型（ analyzed ）已分析的全文字段，這意味著查詢字符串本身也應(yīng)該被分析。
1. 分析查詢字符串。
將查詢的字符串 wei cui傳入標(biāo)準(zhǔn)分析器中，輸出的結(jié)果是單個(gè)項(xiàng) wei。因?yàn)橹挥幸粋€(gè)單詞項(xiàng)，所以 match 查詢執(zhí)行的是單個(gè)底層 term 查詢。
1. 查找匹配文檔。
用 term 查詢?cè)诘古潘饕胁檎?wei然后獲取一組包含該項(xiàng)的文檔，本例的結(jié)果是文檔：1、2 和 3 。
1. 為每個(gè)文檔評(píng)分。
用 term 查詢計(jì)算每個(gè)文檔相關(guān)度評(píng)分 _score ，這是種將詞頻（term frequency，即詞 quick 在相關(guān)文檔的 title 字段中出現(xiàn)的頻率）和反向文檔頻率（inverse document frequency，即詞 quick 在所有文檔的 title 字段中出現(xiàn)的頻率），以及字段的長(zhǎng)度（即字段越短相關(guān)度越高）相結(jié)合的計(jì)算方式。

查詢結(jié)果：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "visit_log",
        "_type" : "_doc",
        "_id" : "nUL9rokBpGsmR0pP0VSc",
        "_score" : null,
        "_source" : {
          "_class" : "org.lwd.microservice.boot.es.entity.VisitLog",
          "id" : 7,
          "tableName" : "VisitLog",
          "userLoginId" : 3,
          "serverIpAddress" : "127.0.0.1",
          "serverHostName" : "liu wei",
          "initialRequest" : "http://localhost:8023",
          "msgContent" : "test es add7",
          "createTime" : 1690446876000
        },
        "sort" : [
          "nUL9rokBpGsmR0pP0VSc"
        ]
      }
    ]
  }
}

match 多詞/分詞

單字段分詞

查詢字段包含wei cui兩個(gè)詞

GET visit_log/_search
{
  "query": { "match": {
    "serverHostName": "wei cui"
  }},
  "sort": [
    { "_id": "asc" }
  ],
  "from": 0,
  "size": 10
}

結(jié)果：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : null,
    "hits" : [
      {
        "_index" : "visit_log",
        "_type" : "_doc",
        "_id" : "TEL9rokBpGsmR0pPXFMo",
        "_score" : null,
        "_source" : {
          "_class" : "org.lwd.microservice.boot.es.entity.VisitLog",
          "id" : 5,
          "tableName" : "VisitLog",
          "userLoginId" : 3,
          "serverIpAddress" : "127.0.0.1",
          "serverHostName" : "wang cui",
          "initialRequest" : "http://localhost:8023",
          "msgContent" : "test es add6",
          "createTime" : 1690446876000
        },
        "sort" : [
          "TEL9rokBpGsmR0pPXFMo"
        ]
      },
      {
        "_index" : "visit_log",
        "_type" : "_doc",
        "_id" : "nUL9rokBpGsmR0pP0VSc",
        "_score" : null,
        "_source" : {
          "_class" : "org.lwd.microservice.boot.es.entity.VisitLog",
          "id" : 7,
          "tableName" : "VisitLog",
          "userLoginId" : 3,
          "serverIpAddress" : "127.0.0.1",
          "serverHostName" : "liu wei",
          "initialRequest" : "http://localhost:8023",
          "msgContent" : "test es add7",
          "createTime" : 1690446876000
        },
        "sort" : [
          "nUL9rokBpGsmR0pP0VSc"
        ]
      }
    ]
  }
}

因?yàn)?match 查詢必須查找兩個(gè)詞（ [“l(fā)iu”,“wei”] ），它在內(nèi)部實(shí)際上先執(zhí)行兩次 term 查詢，然后將兩次查詢的結(jié)果合并作為最終結(jié)果輸出。為了做到這點(diǎn)，它將兩個(gè) term 查詢包入一個(gè) bool 查詢中，
所以上述查詢的結(jié)果，和如下語(yǔ)句查詢結(jié)果是等同的

GET /visit_log/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "serverHostName": "liu"
          }
        },
        {
          "term": {
            "serverHostName": "cui"
          }
        }
      ]
    }
  }
}

match多個(gè)詞的邏輯

上面等同于should（任意一個(gè)滿足），是因?yàn)?match還有一個(gè)operator參數(shù)，默認(rèn)是or, 所以對(duì)應(yīng)的是should。

GET /visit_log/_search
{
  "query": {
    "match": {
      "serverHostName": {
        "query": "wang cui",
        "operator": "or"
      }
    }
  }
}

多字段分詞

GET /visit_log/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "serverHostName": "cui wei" }},
        { "match": { "msgContent": "add3 add4" }}
      ]
    }
  }
}

控制match的匹配精度

如果用戶給定 3 個(gè)查詢?cè)~，想查找至少包含其中 2 個(gè)的文檔，該如何處理？將 operator 操作符參數(shù)設(shè)置成 and 或者 or 都是不合適的。
match 查詢支持 minimum_should_match 最小匹配參數(shù)，這讓我們可以指定必須匹配的詞項(xiàng)數(shù)用來(lái)表示一個(gè)文檔是否相關(guān)。我們可以將其設(shè)置為某個(gè)具體數(shù)字，更常用的做法是將其設(shè)置為一個(gè)百分?jǐn)?shù)，因?yàn)槲覀儫o(wú)法控制用戶搜索時(shí)輸入的單詞數(shù)量：

GET /visit_log/_search
{
  "query": {
    "match": {
      "serverHostName": {
        "query": "wang cui wangcui",
        "minimum_should_match": "75%"
      }
    }
  }
}

當(dāng)然也等同于

GET /visit_log/_search
{
  "query": {
    "bool": {
      "should": [
        { "match": { "serverHostName": "wang" }},
        { "match": { "serverHostName": "cui"   }},
        { "match": { "serverHostName": "wangcui"   }}
      ],
      "minimum_should_match": 2 
    }
  }
}

match_pharse_prefix分詞前綴方式

那有沒有可以查詢出quick brown f的方式呢？ELasticSearch在match_phrase基礎(chǔ)上提供了一種可以查最后一個(gè)詞項(xiàng)是前綴的方法，這樣就可以查詢test es a了

GET /visit_log/_search
{
  "query": {
    "match_phrase_prefix": {
      "msgContent": {
        "query": "test es a"
      }
    }
  }
}

(ps: prefix的意思不是整個(gè)text的開始匹配，而是最后一個(gè)詞項(xiàng)滿足term的prefix查詢而已)

match_bool_prefix

GET /visit_log/_search
{
  "query": {
    "match_bool_prefix": {
      "msgContent": {
        "query": "es test a"
      }
    }
  }
}

所以這樣你就能理解，match_bool_prefix查詢中的quick,brown,f是無(wú)序的。

multi_match多字段匹配

GET /visit_log/_search
{
  "query": {
    "multi_match" : {
      "query":    "add7 wang",
      "fields": [ "msgContent", "*HostName" ] 
    }
  }
}

結(jié)果：

{
  "took" : 1,
  "timed_out" : false,
  "_shards" : {
    "total" : 1,
    "successful" : 1,
    "skipped" : 0,
    "failed" : 0
  },
  "hits" : {
    "total" : {
      "value" : 3,
      "relation" : "eq"
    },
    "max_score" : 1.7917595,
    "hits" : [
      {
        "_index" : "visit_log",
        "_type" : "_doc",
        "_id" : "nUL9rokBpGsmR0pP0VSc",
        "_score" : 1.7917595,
        "_source" : {
          "_class" : "org.lwd.microservice.boot.es.entity.VisitLog",
          "id" : 7,
          "tableName" : "VisitLog",
          "userLoginId" : 3,
          "serverIpAddress" : "127.0.0.1",
          "serverHostName" : "liu wei",
          "initialRequest" : "http://localhost:8023",
          "msgContent" : "test es add7",
          "createTime" : 1690446876000
        }
      },
      {
        "_index" : "visit_log",
        "_type" : "_doc",
        "_id" : "TEL9rokBpGsmR0pPXFMo",
        "_score" : 1.0800905,
        "_source" : {
          "_class" : "org.lwd.microservice.boot.es.entity.VisitLog",
          "id" : 5,
          "tableName" : "VisitLog",
          "userLoginId" : 3,
          "serverIpAddress" : "127.0.0.1",
          "serverHostName" : "wang cui",
          "initialRequest" : "http://localhost:8023",
          "msgContent" : "test es add6",
          "createTime" : 1690446876000
        }
      },
      {
        "_index" : "visit_log",
        "_type" : "_doc",
        "_id" : "6UL9rokBpGsmR0pPjVOS",
        "_score" : 1.0800905,
        "_source" : {
          "_class" : "org.lwd.microservice.boot.es.entity.VisitLog",
          "id" : 6,
          "tableName" : "VisitLog",
          "userLoginId" : 3,
          "serverIpAddress" : "127.0.0.1",
          "serverHostName" : "wang ting",
          "initialRequest" : "http://localhost:8023",
          "msgContent" : "test es add6",
          "createTime" : 1690446876000
        }
      }
    ]
  }
}

*表示前綴匹配字段。

query string類型

此查詢使用語(yǔ)法根據(jù)運(yùn)算符（例如AND或）來(lái)解析和拆分提供的查詢字符串NOT。然后查詢?cè)诜祷仄ヅ涞奈臋n之前獨(dú)立分析每個(gè)拆分的文本。
可以使用該query_string查詢創(chuàng)建一個(gè)復(fù)雜的搜索，其中包括通配符，跨多個(gè)字段的搜索等等。盡管用途廣泛，但查詢是嚴(yán)格的，如果查詢字符串包含任何無(wú)效語(yǔ)法，則返回錯(cuò)誤。
例如：


GET /visit_log/_search
{
  "query": {
    "query_string": {
      "query": "(wangcui) OR (add6)",
      "fields": [ "msgContent", "*HostName" ] 
    }
  }
}

Interval類型

Intervals是時(shí)間間隔的意思，本質(zhì)上將多個(gè)規(guī)則按照順序匹配。


GET /visit_log/_search
{
  "query": {
    "intervals" : {
      "msgContent" : {
        "all_of" : {
          "ordered" : true,
          "intervals" : [
            {
              "match" : {
                "query" : "liu",
                "max_gaps" : 0,
                "ordered" : true
              }
            },
            {
              "any_of" : {
                "intervals" : [
                  { "match" : { "query" : "es" } },
                  { "match" : { "query" : "add6" } }
                ]
              }
            }
          ]
        }
      }
    }
  }
}

因?yàn)閕nterval之間是可以組合的，所以它可以表現(xiàn)的很復(fù)雜

DSL查詢之Term詳解

自行查官方文檔，有可能后邊會(huì)出詳解文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-628732.html

聚合查詢之Bucket聚合詳解

自行查官方文檔，有可能后邊會(huì)出詳解

聚合查詢之Metric聚合詳解

自行查官方文檔，有可能后邊會(huì)出詳解

聚合查詢之Pipline聚合詳解

自行查官方文檔，有可能后邊會(huì)出詳解

其他

外傳

?? 原創(chuàng)不易，如若本文能夠幫助到您的同學(xué)
?? 支持我：關(guān)注我+點(diǎn)贊??+收藏??
?? 留言：探討問題，看到立馬回復(fù)
?? 格言：己所不欲勿施于人 揚(yáng)帆起航、游歷人生、永不言棄！??

到了這里，關(guān)于Elasticsearch 全文檢索分詞檢索-Elasticsearch文章四的文章就介紹完了。如果您還想了解更多內(nèi)容，請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！