国产无码综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

四、初探[ElasticSearch]集群架構(gòu)原理與搜索技術(shù)

2年前作者：叫我柒月分類：Toy博客閱讀(19)違法舉報

這篇具有很好參考價值的文章主要介紹了四、初探[ElasticSearch]集群架構(gòu)原理與搜索技術(shù)。希望對大家有所幫助。如果存在錯誤或未考慮完全的地方，請大家不吝賜教，您也可以點擊"舉報違法"按鈕提交疑問。

一、淺析Elasticsearch架構(gòu)原理

1.Elasticsearch的節(jié)點類型

在Elasticsearch主要分成兩類節(jié)點，一類是Master，一類是DataNode。

1.1 Master節(jié)點

在Elasticsearch啟動時，會選舉出來一個Master節(jié)點。采用Zen Discovery¹機制選出master節(jié)點并且找到集群中的其他節(jié)點，并建立連接。一個Elasticsearch集群中，只有一個Master節(jié)點。(這里的一個是在集群范圍中的，而不是指定某臺服務(wù)器一直就是主節(jié)點，主節(jié)點所在服務(wù)器宕機，其他的某一個節(jié)點有機會成為master節(jié)點)

Master節(jié)點主要功能:：

管理索引和分片的創(chuàng)建、刪除和重新分配。
監(jiān)測節(jié)點的狀態(tài)，并在需要時進行重分配。
協(xié)調(diào)節(jié)點之間的數(shù)據(jù)復(fù)制和同步工作。
處理集群級別操作，如創(chuàng)建或刪除索引、添加或刪除節(jié)點等。
維護集群的健康狀態(tài)，并在集群出現(xiàn)問題時采取措施解決。
維護元數(shù)據(jù)²

1.2DataNode節(jié)點

與master節(jié)點不同，datanode節(jié)點可能會有多個。這個取決于你集群的節(jié)點數(shù)量，因為master在集群中只能有一個，其余為DataNode節(jié)點。
DataNode節(jié)點主要功能:

存儲和索引數(shù)據(jù)：Data Node 節(jié)點會將索引分片存儲在本地磁盤上，并對查詢請求進行響應(yīng)。
復(fù)制和同步數(shù)據(jù)：為了確保數(shù)據(jù)的可靠性和高可用性，ElasticSearch 會將每個原始分片的多個副本存儲在不同的 Data Node 節(jié)點上，并定期將各節(jié)點上的數(shù)據(jù)進行同步。
參與搜索和聚合操作：當客戶端提交搜索請求時，Data Node 節(jié)點會使用本地緩存和分片數(shù)據(jù)完成搜索和聚合操作。
執(zhí)行數(shù)據(jù)維護操作：例如，清理過期數(shù)據(jù)和壓縮分片等。

二、分片和副本機制

在第一篇文章中也有介紹過這倆個概念，這里在集群中再次進行解釋
四、初探[ElasticSearch]集群架構(gòu)原理與搜索技術(shù)

2.1分片

ElasticSearch是一個分布式的搜索引擎，索引索引可以分成一份或多份，多份分片分布在不同節(jié)點當中。ElasticSearch會自動管理分片，如果發(fā)現(xiàn)分片分布不均衡，就會自動遷移。

2.2副本

在ElasticSearch中每個分片都有一個主分片，可能會有若干個副本分片(默認一個分片，一個副本)，這些副本也會分布在不同的節(jié)點上。

2.3指定分片、副本數(shù)量

PUT /test_index06
{
  "mappings": {
    "properties": {
      "name": {
        "type": "keyword",
        "index": true,
        "store": true
      },
      ................
    }
  },
  //設(shè)置分片數(shù)量1，副本數(shù)量2
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 2
  }
}

2.4查看分片、主分片、副本分片

GET /_cat/indices?v

三、Elasticsearch工作流程

3.1Elasticsearch文檔寫入原理

四、初探[ElasticSearch]集群架構(gòu)原理與搜索技術(shù)

如何知道我插入一條數(shù)據(jù)要保存到那個分片呢？

shard = hash(routing) % number_of_primary_shards
routing 是一個可變值，默認是文檔的 _id。
number_of_primary_shards為分片數(shù)量
你也可以使用自己的自定義分片鍵，只需在索引時指定"_routing"字段即可。

3.2Elasticsearch檢索原理

四、初探[ElasticSearch]集群架構(gòu)原理與搜索技術(shù)
客戶端發(fā)起查詢請求，某個DataNode接收到請求，該DataNode就會成為協(xié)調(diào)節(jié)點。
協(xié)調(diào)節(jié)點（Coordinating Node）將查詢請求廣播到每一個數(shù)據(jù)節(jié)點，這些數(shù)據(jù)節(jié)點的分片會處理該查詢請求，每個分片進行數(shù)據(jù)查詢，將符合條件的數(shù)據(jù)放在一個優(yōu)先隊列中，并將這些數(shù)據(jù)的文檔ID、節(jié)點信息、分片信息返回給協(xié)調(diào)節(jié)點。
協(xié)調(diào)節(jié)點將所有的結(jié)果進行匯總，并進行全局排序，協(xié)調(diào)節(jié)點向包含這些文檔ID的分片發(fā)送get請求，對應(yīng)的分片將文檔數(shù)據(jù)返回給協(xié)調(diào)節(jié)點，最后協(xié)調(diào)節(jié)點將數(shù)據(jù)返回給客戶端。

四、Elasticsearch準實時索引實現(xiàn)

四、初探[ElasticSearch]集群架構(gòu)原理與搜索技術(shù)

4.1溢寫到文件系統(tǒng)緩存

當數(shù)據(jù)寫入到ES分片時，會首先寫入到內(nèi)存中，然后通過內(nèi)存的buffer生成一個segment，并刷到文件系統(tǒng)緩存中，數(shù)據(jù)可以被檢索（注意不是直接刷到磁盤）ES中默認1秒，refresh緩存一次。

4.2寫translog保障容錯

在寫入到內(nèi)存中的同時，也會記錄translog日志，在refresh期間出現(xiàn)異常，會根據(jù)translog來進行數(shù)據(jù)恢復(fù)
等到文件系統(tǒng)緩存中的segment數(shù)據(jù)都刷到磁盤中，清空translog文件。

4.3flush到磁盤(刷盤)

ES默認每隔30分鐘會將文件系統(tǒng)緩存的數(shù)據(jù)刷入到磁盤。

4.4segment合并

Segment太多時，ES定期會將多個segment合并成為大的segment，減少索引查詢時IO開銷，此階段ES會真正的物理刪除（之前執(zhí)行過的delete的數(shù)據(jù)）。

五.手動控制搜索結(jié)果精準度

5.1operator與minimum_should_match簡單使用

①查詢document中的remark字段包含java或developer詞組。

GET /test_index05/_search
{
  "query": {
    "match": {
      "remark": "java developer"
    }
  }
}

或者這樣查詢

GET /test_index05/_search
{
  "query": {
    "match": {
      "remark": {
        "query": "java developer",
        "operator": "or"
      }
    }
  }
}

結(jié)果

 "hits" : [
      {
        "_index" : "test_index05",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.77041256,
        "_source" : {
          "name" : "寶塔鎮(zhèn)河妖",
          "sex" : 1,
          "age" : 25,
          "address" : "上海",
          "remark" : "java developer"
        }
      },
      {
        "_index" : "test_index05",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.21110919,
        "_source" : {
          "name" : "天王蓋地虎",
          "sex" : 1,
          "age" : 25,
          "address" : "上海",
          "remark" : "java"
        }
      }
    ]

②查詢document中的remark字段，同時包含java和developer詞組

GET /test_index05/_search
{
  "query": {
    "match": {
      "remark": {
        "query": "java developer",
        "operator": "and"
      }
    }
  }
}

結(jié)果

"hits" : [
      {
        "_index" : "test_index05",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.77041256,
        "_source" : {
          "name" : "寶塔鎮(zhèn)河妖",
          "sex" : 1,
          "age" : 25,
          "address" : "上海",
          "remark" : "java developer"
        }
      }
    ]

③minimum_should_match可以使用百分比或固定數(shù)字。百分比代表query搜索條件中詞條百分比，如果無法整除，向下匹配（如，query條件有3個單詞，如果使用百分比提供精準度計算，那么是無法除盡的，如果需要至少匹配兩個單詞，則需要用67%來進行描述。如果使用66%描述，ES則認為匹配一個單詞即可）。固定數(shù)字代表query搜索條件中的詞條，至少需要匹配多少個。
③-1百分比
查詢內(nèi)容包括java 或developer或assistant中匹配度達到66%即文檔內(nèi)容中，至少包括一個單詞出現(xiàn)。

GET /test_index05/_search
{
  "query": {
    "match": {
      "remark": {
        "query": "java developer assistant",
        "minimum_should_match": "66%"
      }
    }
  }
}

"hits" : [
      {
        "_index" : "test_index05",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.21110919,
        "_source" : {
          "name" : "天王蓋地虎",
          "sex" : 1,
          "age" : 25,
          "address" : "上海",
          "remark" : "java"
        }
      },
      {
        "_index" : "test_index05",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.160443,
        "_source" : {
          "name" : "寶塔鎮(zhèn)河妖",
          "sex" : 1,
          "age" : 25,
          "address" : "上海",
          "remark" : "java developer"
        }
      }
    ]

查詢內(nèi)容包括java 或architect 或assistant中匹配度達到67%即文檔內(nèi)容中，至少包括兩個個單詞出現(xiàn)。

GET /test_index05/_search
{
  "query": {
    "match": {
      "remark": {
        "query": "java developer assistant",
        "minimum_should_match": "67%"
      }
    }
  }
}

結(jié)果

"hits" : [
      {
        "_index" : "test_index05",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.77041256,
        "_source" : {
          "name" : "寶塔鎮(zhèn)河妖",
          "sex" : 1,
          "age" : 25,
          "address" : "上海",
          "remark" : "java developer"
        }
      }
    ]

③-2固定數(shù)字
查詢的內(nèi)容中至少出現(xiàn)下面三個條件中的兩個，即java、developer、assistant這三個單詞，至少有兩個同時出現(xiàn)才符合條件。

GET /test_index05/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "match": {
            "remark": "java"
          }
        },
        {
          "match": {
            "remark": "developer"
          }
        },
        {
          "match": {
            "remark": "assistant"
          }
        }
      ],
      "minimum_should_match": 2
    }
  }
}

結(jié)果

"hits" : [
      {
        "_index" : "test_index05",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 0.77041256,
        "_source" : {
          "name" : "寶塔鎮(zhèn)河妖",
          "sex" : 1,
          "age" : 25,
          "address" : "上海",
          "remark" : "java developer"
        }
      }
    ]

5.2、match 的底層轉(zhuǎn)換

我們輸入的查詢語句

GET /test_index05/_search
{
  "query": {
    "match": {
      "remark": "java developer"
    }
  }
}

轉(zhuǎn)換后的查詢語句

GET /test_index05/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "remark": "java"
          }
        },
        {
          "term": {
            "remark": {
              "value": "developer"
            }
          }
        }
      ]
    }
  }
}

查詢語句

GET /test_index05/_search
{
  "query": {
    "match": {
      "remark": {
        "query": "java developer",
        "operator": "and"
      }
    }
  }
}

轉(zhuǎn)換后

GET /test_index05/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "term": {
            "remark": "java"
          }
        },
        {
          "term": {
            "remark": {
              "value": "developer"
            }
          }
        }
      ]
    }
  }
}

查詢條件

GET /test_index05/_search
{
  "query": {
    "match": {
      "remark": {
        "query": "java developer assistant",
        "minimum_should_match": "68%"
      }
    }
  }
}

轉(zhuǎn)換后

GET /test_index05/_search
{
  "query": {
    "bool": {
      "should": [
        {
          "term": {
            "remark": "java"
          }
        },
        {
          "term": {
            "remark": "developer"
          }
        },
        {
          "term": {
            "remark": "assistant"
          }
        }
      ],
      "minimum_should_match": 2
    }
  }
}

使用轉(zhuǎn)換后的語法執(zhí)行搜索，效率更高。

5.3、boost權(quán)重控制

搜索document中remark字段中包含java的數(shù)據(jù)，如果remark中包含developer或assistant，則包含assistant的document優(yōu)先顯示。（就是將assistant數(shù)據(jù)匹配時的相關(guān)度分數(shù)增加）。
一般用于搜索時相關(guān)度排序使用。如：電商中的綜合排序。將一個商品的銷量，廣告投放，評價值，庫存，單價比較綜合排序。在上述的排序元素中，廣告投放權(quán)重最高，庫存權(quán)重最低。還有就是百度搜索內(nèi)容時前幾一般都是廣告。

例如
索引test_index05下全部數(shù)據(jù)為

"hits" : [
      {
        "_index" : "test_index05",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 1.0,
        "_source" : {
          "name" : "天王蓋地虎",
          "sex" : 1,
          "age" : 25,
          "address" : "上海",
          "remark" : "java"
        }
      },
      {
        "_index" : "test_index05",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.0,
        "_source" : {
          "name" : "寶塔鎮(zhèn)河妖",
          "sex" : 1,
          "age" : 25,
          "address" : "上海",
          "remark" : "java developer"
        }
      },
      {
        "_index" : "test_index05",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "name" : "鐵鍋燉大鵝",
          "sex" : 1,
          "age" : 19,
          "address" : "天津",
          "remark" : "java assistant"
        }
      }
    ]

查詢（boost越高，表示權(quán)重越高，越優(yōu)先展示）

GET /test_index05/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "remark": "java"
          }
        }
      ],
      "should": [
        {
          "match": {
            "remark": {
              "query": "developer",
              "boost": 1
            }
          }
        },
        {
          "match": {
            "remark": {
              "query": "assistant",
              "boost": 3
            }
          }
        }
      ]
    }
  }
}

結(jié)果

"hits" : [
      {
        "_index" : "test_index05",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 2.3016074,
        "_source" : {
          "name" : "鐵鍋燉大鵝",
          "sex" : 1,
          "age" : 19,
          "address" : "天津",
          "remark" : "java assistant"
        }
      },
      {
        "_index" : "test_index05",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 1.474477,
        "_source" : {
          "name" : "寶塔鎮(zhèn)河妖",
          "sex" : 1,
          "age" : 25,
          "address" : "上海",
          "remark" : "java developer"
        }
      },
      {
        "_index" : "test_index05",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.43250346,
        "_source" : {
          "name" : "天王蓋地虎",
          "sex" : 1,
          "age" : 25,
          "address" : "上海",
          "remark" : "java"
        }
      }
    ]

查詢

GET /test_index05/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "remark": "java"
          }
        }
      ],
      "should": [
        {
          "match": {
            "remark": {
              "query": "developer",
              "boost": 2
            }
          }
        },
        {
          "match": {
            "remark": {
              "query": "assistant",
              "boost": 1
            }
          }
        }
      ]
    }
  }
}

結(jié)果

"hits" : [
      {
        "_index" : "test_index05",
        "_type" : "_doc",
        "_id" : "2",
        "_score" : 2.611973,
        "_source" : {
          "name" : "寶塔鎮(zhèn)河妖",
          "sex" : 1,
          "age" : 25,
          "address" : "上海",
          "remark" : "java developer"
        }
      },
      {
        "_index" : "test_index05",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.9918565,
        "_source" : {
          "name" : "鐵鍋燉大鵝",
          "sex" : 1,
          "age" : 19,
          "address" : "天津",
          "remark" : "java assistant"
        }
      },
      {
        "_index" : "test_index05",
        "_type" : "_doc",
        "_id" : "1",
        "_score" : 0.43250346,
        "_source" : {
          "name" : "天王蓋地虎",
          "sex" : 1,
          "age" : 25,
          "address" : "上海",
          "remark" : "java"
        }
      }
    ]

5.4、基于dis_max實現(xiàn)best fields策略進行多字段搜索

best_fields策略：搜索的document中的某一個field，盡可能多的匹配搜索條件。
most_fields策略：與best fields相反的是，盡可能多的字段匹配到搜索條件。

dis_max語法：直接獲取搜索的多條件中的，單條件query相關(guān)度分數(shù)最高的數(shù)據(jù)，以這個數(shù)據(jù)做相關(guān)度排序。

best fields策略實現(xiàn)舉例(是找name字段中’秀兒’匹配相關(guān)度分數(shù)或remark字段中’java developer’匹配相關(guān)度分數(shù)，哪個高，就使用哪一個相關(guān)度分數(shù)進行結(jié)果排序。)

GET /test_index06/_search
{
  "query": {
    "dis_max": {
      "queries": [
        {
          "match": {
            "remark": "java developer"
          }
        },
        {
          "match": {
            "name": "秀兒"
          }
        }
      ]
    }
  }
}

結(jié)果

"hits" : [
      {
        "_index" : "test_index06",
        "_type" : "_doc",
        "_id" : "vkjNUIcB14FuHovqnIz1",
        "_score" : 2.3842063,
        "_source" : {
          "name" : "秀兒",
          "sex" : 1,
          "age" : 26,
          "book" : "Spring",
          "remark" : "C developer"
        }
      },
      {
        "_index" : "test_index06",
        "_type" : "_doc",
        "_id" : "vEjNUIcB14FuHovqKIyb",
        "_score" : 1.781607,
        "_source" : {
          "name" : "rod",
          "sex" : 1,
          "age" : 25,
          "book" : "Spring",
          "remark" : "java developer"
        }
      },
      {
        "_index" : "test_index06",
        "_type" : "_doc",
        "_id" : "vUjNUIcB14FuHovqaYzA",
        "_score" : 0.24116206,
        "_source" : {
          "name" : "rods",
          "sex" : 1,
          "age" : 26,
          "book" : "Spring",
          "remark" : "python developer"
        }
      },
      {
        "_index" : "test_index06",
        "_type" : "_doc",
        "_id" : "v0jOUIcB14FuHovqTYw0",
        "_score" : 0.24116206,
        "_source" : {
          "name" : "Tom",
          "sex" : 1,
          "age" : 26,
          "book" : "Spring",
          "remark" : "C developer"
        }
      },
      {
        "_index" : "test_index06",
        "_type" : "_doc",
        "_id" : "wEjOUIcB14FuHovqYYw0",
        "_score" : 0.24116206,
        "_source" : {
          "name" : "Amy",
          "sex" : 1,
          "age" : 26,
          "book" : "Spring",
          "remark" : "C developer"
        }
      }
    ]

5.5、基于tie_breaker參數(shù)優(yōu)化dis_max搜索效果

我不想根據(jù)打分最高的那個字段進行排序展示，我想讓其他的字段也參與進來咋辦？

dis_max是將多個搜索query條件中相關(guān)度分數(shù)最高的用于結(jié)果排序，忽略其他query分數(shù)，在某些情況下，可能還需要其他query條件中的相關(guān)度介入最終的結(jié)果排序，這個時候可以使用tie_breaker參數(shù)來優(yōu)化dis_max搜索。tie_breaker參數(shù)代表的含義是：將其他query搜索條件的相關(guān)度分數(shù)乘以參數(shù)值，再參與到結(jié)果排序中。如果不定義此參數(shù)，相當于參數(shù)值為0。所以其他query條件的相關(guān)度分數(shù)被忽略。

tie_breaker指定的值最大為1，除最高分字段，設(shè)置其他字段打分的策略，即其他字段得分乘指定的系數(shù)，如果不加這個tie_breaker則默認為0

GET /test_index06/_search
{
  "query": {
    "dis_max": {
      "queries": [
        {
          "match": {
            "remark": "java developer"
          }
        },
        {
          "match": {
            "name": "秀兒"
          }
        }
      ],
      "tie_breaker": 0.5
    }
  }
}

結(jié)果

"hits" : [
      {
        "_index" : "test_index06",
        "_type" : "_doc",
        "_id" : "vkjNUIcB14FuHovqnIz1",
        "_score" : 2.5047874,
        "_source" : {
          "name" : "秀兒",
          "sex" : 1,
          "age" : 26,
          "book" : "Spring",
          "remark" : "C developer"
        }
      },
      {
        "_index" : "test_index06",
        "_type" : "_doc",
        "_id" : "vEjNUIcB14FuHovqKIyb",
        "_score" : 1.781607,
        "_source" : {
          "name" : "rod",
          "sex" : 1,
          "age" : 25,
          "book" : "Spring",
          "remark" : "java developer"
        }
      },
      {
        "_index" : "test_index06",
        "_type" : "_doc",
        "_id" : "vUjNUIcB14FuHovqaYzA",
        "_score" : 0.24116206,
        "_source" : {
          "name" : "rods",
          "sex" : 1,
          "age" : 26,
          "book" : "Spring",
          "remark" : "python developer"
        }
      },
      {
        "_index" : "test_index06",
        "_type" : "_doc",
        "_id" : "v0jOUIcB14FuHovqTYw0",
        "_score" : 0.24116206,
        "_source" : {
          "name" : "Tom",
          "sex" : 1,
          "age" : 26,
          "book" : "Spring",
          "remark" : "C developer"
        }
      },
      {
        "_index" : "test_index06",
        "_type" : "_doc",
        "_id" : "wEjOUIcB14FuHovqYYw0",
        "_score" : 0.24116206,
        "_source" : {
          "name" : "Amy",
          "sex" : 1,
          "age" : 26,
          "book" : "Spring",
          "remark" : "C developer"
        }
      }
    ]

5.6、使用multi_match簡化dis_max+tie_breaker(不常用)

ES中相同結(jié)果的搜索也可以使用不同的語法語句來實現(xiàn)。

查詢方式1

GET /test_index06/_search
{
  "query": {
    "dis_max": {
      "queries": [
        {
          "match": {
            "name": "Tom"
          }
        },
        {
          "match": {
            "remark": {
              "query": "java developer",
              "boost": 2,
              "minimum_should_match": 2
            }
          }
        }
      ],
      "tie_breaker": 0.5
    }
  }
}

結(jié)果

"hits" : {
    "total" : {
      "value" : 2,
      "relation" : "eq"
    },
    "max_score" : 3.563214,
    "hits" : [
      {
        "_index" : "test_index06",
        "_type" : "_doc",
        "_id" : "vEjNUIcB14FuHovqKIyb",
        "_score" : 3.563214,
        "_source" : {
          "name" : "rod",
          "sex" : 1,
          "age" : 25,
          "book" : "Spring",
          "remark" : "java developer"
        }
      },
      {
        "_index" : "test_index06",
        "_type" : "_doc",
        "_id" : "v0jOUIcB14FuHovqTYw0",
        "_score" : 1.6360589,
        "_source" : {
          "name" : "Tom",
          "sex" : 1,
          "age" : 26,
          "book" : "Spring",
          "remark" : "C developer"
        }
      }
    ]

查詢方式2(其中type常用的有best_fields和most_fields。^n代表權(quán)重，相當于"boost":n。)

GET /test_index06/_search
{
  "query": {
    "multi_match": {
      "query": "Tom java developer",
      "fields": [
        "name",
        "remark^2"
      ],
      "type": "best_fields",
      "tie_breaker": 0.5,
      "minimum_should_match": "50%"
    }
  }
}

結(jié)果

"hits" : [
      {
        "_index" : "test_index06",
        "_type" : "_doc",
        "_id" : "vEjNUIcB14FuHovqKIyb",
        "_score" : 3.563214,
        "_source" : {
          "name" : "rod",
          "sex" : 1,
          "age" : 25,
          "book" : "Spring",
          "remark" : "java developer"
        }
      },
      {
        "_index" : "test_index06",
        "_type" : "_doc",
        "_id" : "v0jOUIcB14FuHovqTYw0",
        "_score" : 1.877221,
        "_source" : {
          "name" : "Tom",
          "sex" : 1,
          "age" : 26,
          "book" : "Spring",
          "remark" : "C developer"
        }
      },
      {
        "_index" : "test_index06",
        "_type" : "_doc",
        "_id" : "vUjNUIcB14FuHovqaYzA",
        "_score" : 0.48232412,
        "_source" : {
          "name" : "rods",
          "sex" : 1,
          "age" : 26,
          "book" : "Spring",
          "remark" : "python developer"
        }
      },
      {
        "_index" : "test_index06",
        "_type" : "_doc",
        "_id" : "vkjNUIcB14FuHovqnIz1",
        "_score" : 0.48232412,
        "_source" : {
          "name" : "秀兒",
          "sex" : 1,
          "age" : 26,
          "book" : "Spring",
          "remark" : "C developer"
        }
      },
      {
        "_index" : "test_index06",
        "_type" : "_doc",
        "_id" : "wEjOUIcB14FuHovqYYw0",
        "_score" : 0.48232412,
        "_source" : {
          "name" : "Amy",
          "sex" : 1,
          "age" : 26,
          "book" : "Spring",
          "remark" : "C developer"
        }
      }
    ]

5.7、cross_fields搜索

cross fields ：一個唯一的標識，分部在多個fields中，使用這種唯一標識搜索數(shù)據(jù)就稱為cross fields搜索。如：人名可以分為姓和名，地址可以分為省、市、區(qū)縣、街道等。那么使用人名或地址來搜索document，就稱為cross fields搜索。實現(xiàn)這種搜索，一般都是使用most fields搜索策略。因為這就不是一個field的問題。Cross fields搜索策略，是從多個字段中搜索條件數(shù)據(jù)。默認情況下，和most fields搜索的邏輯是一致的，計算相關(guān)度分數(shù)是和best fields策略一致的。一般來說，如果使用cross fields搜索策略，那么都會攜帶一個額外的參數(shù)operator。用來標記搜索條件如何在多個字段中匹配。在ES中也有cross fields搜索策略

例如(搜索條件中的java必須在name或remark字段中匹配，developer也必須在name或remark字段中匹配。)

GET /test_index06/_search
{
  "query": {
    "multi_match": {
      "query": "java developer",
      "fields": [
        "name",
        "remark"
      ],
      "type": "cross_fields",
      "operator": "and"
    }
  }
}

結(jié)果

"hits" : {
    "total" : {
      "value" : 1,
      "relation" : "eq"
    },
    "max_score" : 1.781607,
    "hits" : [
      {
        "_index" : "test_index06",
        "_type" : "_doc",
        "_id" : "vEjNUIcB14FuHovqKIyb",
        "_score" : 1.781607,
        "_source" : {
          "name" : "rod",
          "sex" : 1,
          "age" : 25,
          "book" : "Spring",
          "remark" : "java developer"
        }
      }
    ]

most field策略問題：most fields策略是盡可能匹配更多的字段，所以會導(dǎo)致精確搜索結(jié)果排序問題。又因為cross fields搜索，不能使用minimum_should_match來去除長尾數(shù)據(jù)。所以在使用most fields和cross fields策略搜索數(shù)據(jù)的時候，都有不同的缺陷。所以商業(yè)項目開發(fā)中，都推薦使用best fields策略實現(xiàn)搜索。

5.8、copy_to組合fields

場景：在電商網(wǎng)站，如果在搜索框中輸入“手機”，點擊搜索，那么是在商品的類型名稱、商品的名稱、商品的賣點、商品的描述等字段中，哪一個字段內(nèi)進行數(shù)據(jù)的匹配？如果使用某一個字段做搜索不合適，那么使用_all做搜索是否合適？也不合適，因為_all字段中可能包含圖片，價格等字段。

假設(shè)，有一個字段，其中的內(nèi)容包括(但不限于)：商品類型名稱、商品名稱、商品賣點等字段的數(shù)據(jù)內(nèi)容。是否可以在這個特殊的字段上進行數(shù)據(jù)搜索匹配？(我理解的就是融合多個字段為一個，理解為該商品的摘要信息。)

以keyword字段舉例，它包括了category_name、product_name、sell_point三個字段的內(nèi)容。

{
  "category_name" : "手機",
  "product_name" : "一加6T手機",
  "price" : 568800,
  "sell_point" : "國產(chǎn)Android手機",
  "tags": ["8G+128G", "256G可擴展"],
  "color" : "紅色",
  "keyword" : "手機 一加6T手機 國產(chǎn)Android手機"
}

copy_to : 就是將多個字段，復(fù)制到一個字段中，實現(xiàn)一個多字段組合。copy_to可以解決cross fields搜索問題，在商業(yè)項目中，也用于解決搜索條件默認字段問題。如果需要使用copy_to語法，則需要在定義index的時候，手工指定mapping映射策略。

例如

PUT /test_index07/_mapping
{
  "properties": {
    "provice": {
      "type": "text",
      "analyzer": "standard",
      "copy_to": "address"
    },
    "city": {
      "type": "text",
      "analyzer": "standard",
      "copy_to": "address"
    },
    "street": {
      "type": "text",
      "analyzer": "standard",
      "copy_to": "address"
    },
    "address": {
      "type": "text",
      "analyzer": "standard"
    }
  }
}

上述的mapping定義中，是新增了4個字段，分別是provice、city、street、address，其中provice、city、street三個字段的值，會自動復(fù)制到address字段中，實現(xiàn)一個字段的組合。那么在搜索地址的時候，就可以在address字段中做條件匹配，從而避免most fields策略導(dǎo)致的問題。在維護數(shù)據(jù)的時候，不需對address字段特殊的維護。因為address字段是一個組合字段，是由ES自動維護的。類似java代碼中的推導(dǎo)屬性。在存儲的時候，未必存在，但是在邏輯上是一定存在的，因為address是由3個物理存在的屬性province、city、street組成的。

5.9、近似匹配

給定一個短語，或者單詞，匹配包含全部或者部分的內(nèi)容。

舉例(test_index06中remark字段沒有g(shù)o相關(guān)內(nèi)容)

GET /test_index06/_search
{
  "query": {
    "match": {
      "remark": "developer go"
    }
  }
}

結(jié)果

"hits" : [
      {
        "_index" : "test_index06",
        "_type" : "_doc",
        "_id" : "vEjNUIcB14FuHovqKIyb",
        "_score" : 0.24116206,
        "_source" : {
          "name" : "rod",
          "sex" : 1,
          "age" : 25,
          "book" : "Spring",
          "remark" : "java developer"
        }
      },
      {
        "_index" : "test_index06",
        "_type" : "_doc",
        "_id" : "vUjNUIcB14FuHovqaYzA",
        "_score" : 0.24116206,
        "_source" : {
          "name" : "rods",
          "sex" : 1,
          "age" : 26,
          "book" : "Spring",
          "remark" : "python developer"
        }
      },
      {
        "_index" : "test_index06",
        "_type" : "_doc",
        "_id" : "vkjNUIcB14FuHovqnIz1",
        "_score" : 0.24116206,
        "_source" : {
          "name" : "秀兒",
          "sex" : 1,
          "age" : 26,
          "book" : "Spring",
          "remark" : "C developer"
        }
      },
      {
        "_index" : "test_index06",
        "_type" : "_doc",
        "_id" : "v0jOUIcB14FuHovqTYw0",
        "_score" : 0.24116206,
        "_source" : {
          "name" : "Tom",
          "sex" : 1,
          "age" : 26,
          "book" : "Spring",
          "remark" : "C developer"
        }
      },
      {
        "_index" : "test_index06",
        "_type" : "_doc",
        "_id" : "wEjOUIcB14FuHovqYYw0",
        "_score" : 0.24116206,
        "_source" : {
          "name" : "Amy",
          "sex" : 1,
          "age" : 26,
          "book" : "Spring",
          "remark" : "C developer"
        }
      }
    ]

舉例

GET /test_index06/_search
{
  "query": {
    "match": {
      "remark": "developerAA"
    }
  }
}

結(jié)果

"hits" : [ ]

如果需要的結(jié)果是有特殊要求，如：java developer 必須是一個完整的短語，不可分割；或document中的field內(nèi)，包含的java 和developer 單詞，且兩個單詞之間離的越近，相關(guān)度分數(shù)越高。那么這種特殊要求的搜索就是近似搜索。搜索包括javb內(nèi)容，搜索條件在java developer數(shù)據(jù)中搜索，或包括 j 搜索提示等數(shù)據(jù)近似搜索的一部分。如何上述特殊要求的搜索，使用match搜索語法就無法實現(xiàn)了。

5.10、match_phrase

短語搜索。就是搜索條件不分詞。代表搜索條件不可分割。

舉例(只會匹配出，java developer同時出現(xiàn)，并且連續(xù)的內(nèi)容，即java developer為一個整體出現(xiàn))

GET /test_index06/_search
{
  "query": {
    "match_phrase": {
      "remark": "java developer"
    }
  }
}

結(jié)果

"hits" : [
      {
        "_index" : "test_index06",
        "_type" : "_doc",
        "_id" : "vEjNUIcB14FuHovqKIyb",
        "_score" : 1.7816072,
        "_source" : {
          "name" : "rod",
          "sex" : 1,
          "age" : 25,
          "book" : "Spring",
          "remark" : "java developer"
        }
      }
    ]

5.10.1match phrase原理 —— term position

ES是如何實現(xiàn)match phrase短語搜索的？其實在ES中，使用match phrase做搜索的時候，也是和match類似，首先對搜索條件進行分詞-analyze。將搜索條件拆分成hello和world。既然是分詞后再搜索，ES是如何實現(xiàn)短語搜索的？
這里涉及到了倒排索引的建立過程。在倒排索引建立的時候，ES會先對document數(shù)據(jù)進行分詞，如：

查詢?nèi)缦戮渥邮侨绾畏衷~的

GET _analyze
{
  "text": "hello world, java spark",
  "analyzer": "standard"
}

結(jié)果

{
  "tokens" : [
    {
      "token" : "hello",
      "start_offset" : 0,
      "end_offset" : 5,
      "type" : "<ALPHANUM>",
      "position" : 0
    },
    {
      "token" : "world",
      "start_offset" : 6,
      "end_offset" : 11,
      "type" : "<ALPHANUM>",
      "position" : 1
    },
    {
      "token" : "java",
      "start_offset" : 13,
      "end_offset" : 17,
      "type" : "<ALPHANUM>",
      "position" : 2
    },
    {
      "token" : "spark",
      "start_offset" : 18,
      "end_offset" : 23,
      "type" : "<ALPHANUM>",
      "position" : 3
    }
  ]
}

從上述結(jié)果中，可以看到。ES在做分詞的時候，除了將數(shù)據(jù)切分外，還會保留一個position。position代表的是這個詞在整個數(shù)據(jù)中的下標。當ES執(zhí)行match phrase搜索的時候，首先將搜索條件hello world分詞為hello和world。然后在倒排索引中檢索數(shù)據(jù)，如果hello和world都在某個document的某個field出現(xiàn)時，那么檢查這兩個匹配到的單詞的position是否是連續(xù)的，如果是連續(xù)的，代表匹配成功，如果是不連續(xù)的，則匹配失敗。

5.10.2match phrase搜索參數(shù) – slop

場景舉例：在做搜索操作的是，如果搜索參數(shù)是hello spark。而ES中存儲的數(shù)據(jù)是hello world, java spark。那么使用match phrase則無法搜索到。在這個時候，可以使用match來解決這個問題。但是，當我們需要在搜索的結(jié)果中，做一個特殊的要求：hello和spark兩個單詞距離越近，document在結(jié)果集合中排序越靠前，這個時候再使用match則未必能得到想要的結(jié)果。

ES的搜索中，對match phrase提供了參數(shù)slop。slop代表match phrase短語搜索的時候，單詞最多移動多少次，可以實現(xiàn)數(shù)據(jù)匹配。在所有匹配結(jié)果中，多個單詞距離越近，相關(guān)度評分越高，排序越靠前。這種使用slop參數(shù)的match phrase搜索，就稱為近似匹配（proximity search）

在Elasticsearch中，slop是指在查詢語句中，詞項之間可以允許的最大距離。它是一種模糊匹配（fuzzy matching）方式，用于解決用戶輸入錯誤或者數(shù)據(jù)存儲時不準確的情況。
當我們進行一個帶有slop參數(shù)的查詢時，Elasticsearch將按照文檔中出現(xiàn)的順序檢查查詢語句中的每個詞，并嘗試找到它們之間最接近的匹配。如果兩個詞之間的距離小于或等于slop的值，則它們被認為是匹配的。
例子：
假設(shè)我們有以下三個文檔：

{
  "id": 1,
  "title": "quick brown fox"
}
{
  "id": 2,
  "title": "quick red fox"
}
{
  "id": 3,
  "title": "slow brown dog"
}

我們希望查找包含“quick”和“fox”的文檔，并且它們之間的最大距離為1。我們可以使用以下查詢：

{
  "query": {
    "match_phrase": {
      "title": {
        "query": "quick fox",
        "slop": 1
      }
    }
  }
}

該查詢將返回文檔1和2，但不會返回文檔3，因為“slow”和“brown”之間的距離大于1。(關(guān)于距離你可以去看下對應(yīng)的position字段，即match phrase原理 —— term position下講解的內(nèi)容)

需要注意的是，slop值越大，匹配的結(jié)果會越多，但是精度也會降低。因此，在使用slop時，需要根據(jù)實際情況進行權(quán)衡。

5.11使用match和proximity search實現(xiàn)召回率和精準度平衡。

召回率：召回率就是搜索結(jié)果比率，如：索引A中有100個document，搜索時返回多少個document，就是召回率（recall）。
精準度：就是搜索結(jié)果的準確率，如：搜索條件為hello java，在搜索結(jié)果中盡可能讓短語匹配和hello java離的近的結(jié)果排序靠前，就是精準度（precision）。
如果在搜索的時候，只使用match phrase語法，會導(dǎo)致召回率低下，因為搜索結(jié)果中必須包含短語（包括proximity search）。
如果在搜索的時候，只使用match語法，會導(dǎo)致精準度底下，因為搜索結(jié)果排序是根據(jù)相關(guān)度分數(shù)算法計算得到。
那么如果需要在結(jié)果中兼顧召回率和精準度的時候，就需要將match和proximity search混合使用，來得到搜索結(jié)果。

索引test_index08下所有內(nèi)容

"hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 1.0,
    "hits" : [
      {
        "_index" : "test_index08",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "f" : "hello, java is very good, spark is also very good"
        }
      },
      {
        "_index" : "test_index08",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "f" : "java and spark, development language "
        }
      },
      {
        "_index" : "test_index08",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 1.0,
        "_source" : {
          "f" : "Java Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs."
        }
      },
      {
        "_index" : "test_index08",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 1.0,
        "_source" : {
          "f" : "java spark and, development language "
        }
      }
    ]

查詢1

GET /test_index08/_search
{
  "query": {
    "match": {
      "f": "java spark"
    }
  }
}

結(jié)果

"hits" : {
    "total" : {
      "value" : 4,
      "relation" : "eq"
    },
    "max_score" : 0.28046143,
    "hits" : [
      {
        "_index" : "test_index08",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.28046143,
        "_source" : {
          "f" : "java and spark, development language "
        }
      },
      {
        "_index" : "test_index08",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 0.28046143,
        "_source" : {
          "f" : "java spark and, development language "
        }
      },
      {
        "_index" : "test_index08",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.23111339,
        "_source" : {
          "f" : "hello, java is very good, spark is also very good"
        }
      },
      {
        "_index" : "test_index08",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 0.16973917,
        "_source" : {
          "f" : "Java Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs."
        }
      }
    ]

查詢2

GET /test_index08/_search
{
  "query": {
    "bool": {
      "must": [
        {
          "match": {
            "f": "java spark"
          }
        }
      ],
      "should": [
        {
          "match_phrase": {
            "f": {
              "query": "java spark",
              "slop": 50
            }
          }
        }
      ]
    }
  }
}

結(jié)果

"hits" : [
      {
        "_index" : "test_index08",
        "_type" : "_doc",
        "_id" : "6",
        "_score" : 0.56092286,
        "_source" : {
          "f" : "java spark and, development language "
        }
      },
      {
        "_index" : "test_index08",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.4815065,
        "_source" : {
          "f" : "java and spark, development language "
        }
      },
      {
        "_index" : "test_index08",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.32339638,
        "_source" : {
          "f" : "hello, java is very good, spark is also very good"
        }
      },
      {
        "_index" : "test_index08",
        "_type" : "_doc",
        "_id" : "5",
        "_score" : 0.30782324,
        "_source" : {
          "f" : "Java Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs."
        }
      }
    ]

5.12前綴搜索 prefix search

使用前綴匹配實現(xiàn)搜索能力。通常針對keyword類型字段，也就是不分詞的字段。

GET /test_a/_mapping

{
  "test_a" : {
    "mappings" : {
      "properties" : {
        "f" : {
          "type" : "text",
          "fields" : {
            "keyword" : {
              "type" : "keyword",
              "ignore_above" : 256
            }
          }
        }
      }
    }
  }
}

所有數(shù)據(jù)如下

"hits" : [
      {
        "_index" : "test_a",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "f" : "hello, java is very good, spark is also very good"
        }
      },
      {
        "_index" : "test_a",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "f" : "java and spark, development language "
        }
      }
    ]

查詢

GET /test_a/_search
{
  "query": {
    "prefix": {
      "f.keyword": {
        "value": "j"
      }
    }
  }
}

結(jié)果

    "hits" : [
      {
        "_index" : "test_a",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "f" : "java and spark, development language "
        }
      }
    ]

查詢

GET /test_a/_search
{
  "query": {
    "prefix": {
      "f.keyword": {
        "value": "J"
      }
    }
  }
}

結(jié)果

"hits" : [ ]

針對前綴搜索，是對keyword類型字段而言。而keyword類型字段數(shù)據(jù)大小寫敏感。前綴搜索效率比較低。前綴搜索不會計算相關(guān)度分數(shù)。前綴越短，效率越低。如果使用前綴搜索，建議使用長前綴。因為前綴搜索需要掃描完整的索引內(nèi)容，所以前綴越長，相對效率越高。

5.13通配符搜索

通配符可以在倒排索引中使用，也可以在keyword類型字段中使用。
（性能很低，也是需要掃描完整的索引）
? :表示一個任意字符
*·:表示0~n個任意字符
查詢

GET /test_a/_search
{
  "query": {
    "wildcard": {
      "f.keyword": {
        "value": "?e*o*"
      }
    }
  }
}

結(jié)果

"hits" : [
      {
        "_index" : "test_a",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "f" : "hello, java is very good, spark is also very good"
        }
      }
    ]

5.14正則搜索

在Elasticsearch中，regexp查詢是一種使用正則表達式進行搜索的查詢方式。它可以在指定字段上匹配滿足正則表達式的文本。

ES支持正則表達式，可以在倒排索引或keyword類型字段中使用。
例如，假設(shè)我們有一個包含文檔標題和內(nèi)容的索引，并想要查找所有標題或內(nèi)容中包含“Elastic”和“search”的文檔。這時候，就可以使用regexp查詢來實現(xiàn)：

{
    "query": {
        "regexp": {
            "_all": ".*Elastic.*search.*"
        }
    }
}

在上述例子中，“_all” 表示對所有字段進行搜索，".*"表示任意字符出現(xiàn)0次或多次。

需要注意的是，正則表達式的查詢效率較低，因為它需要對每個文檔的每個字段都進行逐一匹配。如果對性能要求較高，應(yīng)該盡量避免使用正則表達式查詢。

另外，Elasticsearch還支持設(shè)置正則表達式的參數(shù)，如ignore_case（是否忽略大小寫），max_determinized_states（最大化自動機狀態(tài)數(shù)），boost（權(quán)重系數(shù)），以及flags（正則表達式標志）。這些參數(shù)可以提高查詢的準確性和靈活性。

再舉例

GET /test_a/_search
{
  "query": {
    "regexp": {
      "f.keyword": "[A-z].+"
    }
  }
}

結(jié)果

"hits" : [
      {
        "_index" : "test_a",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 1.0,
        "_source" : {
          "f" : "hello, java is very good, spark is also very good"
        }
      },
      {
        "_index" : "test_a",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 1.0,
        "_source" : {
          "f" : "java and spark, development language "
        }
      }
    ]

性能很低，需要掃描完整索引，應(yīng)該盡量避免在大型索引中使用

5.15搜索推薦

在Elasticsearch中，match_phrase_prefix查詢是一種結(jié)合了match和prefix兩種查詢的組合查詢。它可以用于匹配以指定前綴開頭的短語。

具體來說，match_phrase_prefix查詢會先將查詢字符串拆分成一個個詞項（term），然后使用前綴匹配算法進行匹配。通常情況下，match_phrase_prefix查詢適用于需要匹配長短語但又希望支持前綴匹配的場景。
搜索推薦： search as your type，搜索提示。如：索引中有若干數(shù)據(jù)以“hello”開頭，那么在輸入hello的時候，推薦相關(guān)信息。（類似百度輸入框）

查詢

GET /test_a/_search
{
  "query": {
    "match_phrase_prefix": {
      "f": {
        "query": "java s",
        "slop": 10,
        "max_expansions": 10
      }
    }
  }
}

結(jié)果

"hits" : [
      {
        "_index" : "test_a",
        "_type" : "_doc",
        "_id" : "4",
        "_score" : 0.28650534,
        "_source" : {
          "f" : "java and spark, development language "
        }
      },
      {
        "_index" : "test_a",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.11460209,
        "_source" : {
          "f" : "hello, java is very good, spark is also very good"
        }
      }
    ]

其原理和match phrase類似，是先使用match匹配term數(shù)據(jù)（java），然后在指定的slop移動次數(shù)范圍內(nèi)，前綴匹配（s），max_expansions是用于指定prefix最多匹配多少個term（單詞），超過這個數(shù)量就不再匹配了。
這種語法的限制是，只有最后一個term會執(zhí)行前綴搜索。
執(zhí)行性能很差，最后一個term是需要掃描所有符合slop要求的倒排索引的term。
因為效率較低，如果必須使用，則一定要使用參數(shù)max_expansions。

5.16fuzzy模糊搜索技術(shù)

Elasticsearch中的fuzzy模糊搜索技術(shù)是一種基于編輯距離（Levenshtein Distance）算法的全文檢索技術(shù)。它允許在查詢時匹配相似但不完全相同的單詞。具體來說，當我們進行一個fuzzy query時，Elasticsearch將會在索引中查找與查詢字符串最接近的項。如果查詢字符串中有一個拼寫錯誤或者一個字符丟失，fuzzy search可以幫助我們找到那些被錯誤拼寫的項。另外，fuzzy search也能夠在搜索時匹配多個單詞之間的相似性。在Elasticsearch中，我們可以使用fuzziness參數(shù)來設(shè)置模糊度，該參數(shù)表示最大編輯距離，即允許的最大差異數(shù)量。默認值為2，這意味著如果兩個單詞的編輯距離超過2，則它們將不會被匹配。我們可以通過增加或減少該參數(shù)來調(diào)整模糊度，以便更好地滿足我們的需求。

搜索的時候，可能搜索條件文本輸入錯誤，如：hello world -> hello word。這種拼寫錯誤還是很常見的。fuzzy技術(shù)就是用于解決錯誤拼寫的（在英文中很有效，在中文中幾乎無效。）。其中fuzziness代表value的值word可以修改多少個字母來進行拼寫錯誤的糾正（修改字母的數(shù)量包含字母變更，增加或減少字母。）。f代表要搜索的字段名稱。
查詢

GET /test_a/_search
{
  "query": {
    "fuzzy": {
      "f": {
        "value": "word",
        "fuzziness": 2
      }
    }
  }
}

結(jié)果

 "hits" : [
      {
        "_index" : "test_a",
        "_type" : "_doc",
        "_id" : "3",
        "_score" : 0.43569255,
        "_source" : {
          "f" : "hello, java is very good, spark is also very good"
        }
      }
    ]

Zen Discovery 是 Elasticsearch 中的一種自動發(fā)現(xiàn)機制，它用于在分布式環(huán)境下管理節(jié)點的發(fā)現(xiàn)和連接。Zen Discovery 能夠自動感知節(jié)點的加入和離開，并在必要時重新分配數(shù)據(jù)和重新平衡群集。
Zen Discovery 機制包括以下幾個方面：
1.Ping 操作：每個節(jié)點會定期向其他節(jié)點發(fā)送 ping 請求，以確定其他節(jié)點是否還在運行。如果一個節(jié)點在一定時間內(nèi)沒有響應(yīng)，那么它就被認為已經(jīng)離開了群集。
2.Unicast 發(fā)現(xiàn)：節(jié)點之間可以通過互相發(fā)送地址列表來進行發(fā)現(xiàn)。在這種方式下，節(jié)點需要知道其他節(jié)點的 IP 地址和端口號，才能夠加入群集。當節(jié)點啟動時，它會向配置的節(jié)點列表發(fā)送加入請求，如果請求成功，則會將該節(jié)點加入群集。
3.Multicast 發(fā)現(xiàn)：在使用 Multicast 發(fā)現(xiàn)機制時，節(jié)點可以通過多播地址來進行發(fā)現(xiàn)。每個節(jié)點將自己的 IP 地址和端口號發(fā)布到特定的多播地址上，其他節(jié)點可以從該地址上接收到所有節(jié)點的信息，從而發(fā)現(xiàn)新的節(jié)點。這種方式下，節(jié)點可以更加靈活地管理群集，可以隨時加入和離開群集。
4.Master 選舉：Zen Discovery 還包括了 Master 節(jié)點的選舉機制，選舉出的 Master 節(jié)點會負責(zé)協(xié)調(diào)群集中的各個節(jié)點。
(當然，在某些情況下，可能需要手動指定主節(jié)點或禁用主節(jié)點競選過程?？梢酝ㄟ^在 elasticsearch.yml 配置文件中設(shè)置 node.master 參數(shù)來實現(xiàn)。如果將該參數(shù)設(shè)置為 false，則表示禁用該節(jié)點的主節(jié)點競選功能；如果將該參數(shù)設(shè)置為 true，則表示該節(jié)點可以參與主節(jié)點競選。默認情況下，所有節(jié)點都會參與主節(jié)點競選，因此無需手動配置。) ??
在 ElasticSearch 集群中，Master 節(jié)點維護的元數(shù)據(jù)包括以下信息：
1.集群狀態(tài)：保存了當前集群的狀態(tài)，如運行狀態(tài)、健康狀態(tài)等。
2.索引元數(shù)據(jù)：保存了所有索引的信息，例如字段映射、分片數(shù)量、副本數(shù)量、索引別名等。
3.節(jié)點元數(shù)據(jù)：保存了所有節(jié)點的信息，例如 IP 地址、節(jié)點名稱、可用空間、JVM 信息等。
4.分片分配信息：保存了每個分片所屬的節(jié)點信息、是否是主分片等。
5.節(jié)點故障檢測信息：保存了節(jié)點最近一次的心跳信息和下線時間，用于檢測節(jié)點是否失效。
這些元數(shù)據(jù)都存儲在 Master 節(jié)點的內(nèi)存中，并與其他節(jié)點進行同步，以確保集群中所有節(jié)點都擁有相同的元數(shù)據(jù)視圖。通過 Master 節(jié)點維護這些元數(shù)據(jù)，可以實現(xiàn)集群管理和協(xié)調(diào)，確保數(shù)據(jù)的高可用性、一致性和完整性。 ??文章來源地址http://www.zghlxwxcb.cn/news/detail-484387.html

到了這里，關(guān)于四、初探[ElasticSearch]集群架構(gòu)原理與搜索技術(shù)的文章就介紹完了。如果您還想了解更多內(nèi)容，請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章，希望大家以后多多支持TOY模板網(wǎng)！

本文來自互聯(lián)網(wǎng)用戶投稿，該文觀點僅代表作者本人，不代表本站立場。本站僅提供信息存儲空間服務(wù)，不擁有所有權(quán)，不承擔(dān)相關(guān)法律責(zé)任。如若轉(zhuǎn)載，請注明出處：如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實不符，請點擊違法舉報進行投訴反饋，一經(jīng)查實，立即刪除！

分享到：

領(lǐng)支付寶紅包贊助服務(wù)器費用

大數(shù)據(jù)小白初探HDFS架構(gòu)原理：帶你揭秘背后的真相（一）
???????? 目錄 1.前言 2. HDFS架構(gòu) 2.1 架構(gòu)定義 2.2 揭秘架構(gòu) 2.3 HDFS核心結(jié)構(gòu) ?3. HDFS 的優(yōu)缺點 ? ? ? ? 3.1 HDFS 的優(yōu)點 ????????3.2 HDFS 的缺點 4. HDFS 的應(yīng)用場景 5. 總結(jié) ? ? ? ? 前面的文章寫了一篇，大數(shù)據(jù)方面的基礎(chǔ)知識，目的是希望大數(shù)據(jù)小白可以對大數(shù)據(jù)能有個清楚
2024年02月21日
瀏覽(16)
Elasticsearch：搜索架構(gòu)
為了理解為什么全文搜索是一個很難解決的問題，讓我們想一個例子。假設(shè)你正在托管一個博客發(fā)布網(wǎng)站，其中包含數(shù)億甚至數(shù)十億的博客文章，每個博客文章包含數(shù)百個單詞，類似于 CSDN。執(zhí)行全文搜索意味著任何用戶都可以搜索 “java” 或 “學(xué)習(xí)編程” 之類的內(nèi)容，并
2024年02月04日
瀏覽(16)
ElasticSearch（五）集群架構(gòu)
? ? ? ? 特性：高可用可擴展 ? ? ? ? 優(yōu)勢：提高系統(tǒng)可用性，部分節(jié)點停止服務(wù)整個集群不受影響 ? ? ? ? ? ? ? ? ? ?存儲可水平擴展概念集群 ????????一個集群可以有一個或者多個節(jié)點 ????????不同的集群通過不同的名字來區(qū)分，默認名字“elasticsearch“ ??
2024年01月17日
瀏覽(18)
Elasticsearch 集群架構(gòu)監(jiān)測調(diào)試優(yōu)化
Elasticsearch7.x是一個基于Lucene的分布式搜索引擎具有以下特點：高性能：能夠處理海量數(shù)據(jù)并實現(xiàn)實時搜索。其內(nèi)置了負載均衡和容錯機制，提供了高可用性和伸縮性。靈活性：支持文本全文檢索、結(jié)構(gòu)化搜索、地理位置搜索等多種搜索方式，同時支持自定義插件擴展。易用
2024年02月13日
瀏覽(18)
架構(gòu)師系列-搜索引擎ElasticSearch（一）
wget https://artifacts.elastic.co/downloads/elasticsearch/elasticsearch-7.17.5- linux-x86_64.tar.gz tar -zvxf elasticsearch-7.17.5-linux-x86_64.tar.gz 關(guān)閉防火墻配置elasticsearch.yml ? 修改Linux句柄數(shù) ? 關(guān)閉swap 因為ES的數(shù)據(jù)大量都是常駐內(nèi)存的，一旦使用了虛擬內(nèi)存就會導(dǎo)致查詢速度下降，一般需要關(guān)閉 swap，
2024年04月14日
瀏覽(41)
【ES專題】ElasticSearch集群架構(gòu)剖析
個人感覺集群架構(gòu)其實都有點大同小異，看了這么多集群架構(gòu)之后，感覺無非要考慮的地方就幾點：使用何種通信協(xié)議去同步數(shù)據(jù)，互相通信采用何種策略同步數(shù)據(jù)（異步還是同步）如何保證一致性，保證到什么程度（【最終一致性】 or【實時一致性 / 強一致性】）使用何
2024年02月04日
瀏覽(13)
架構(gòu)師系列-搜索引擎ElasticSearch（四）- 高級查詢
ES查詢該方式可以通過kabana、curl、elasticsearch-head（純前端）去操作 term查詢和字段類型有關(guān)系，首先回顧一下ElasticSearch兩個數(shù)據(jù)類型 ElasticSearch兩個數(shù)據(jù)類型 1、text：會分詞，不支持聚合 2、keyword：不會分詞，將全部內(nèi)容作為一個詞條，支持聚合 term查詢：不會對查詢條件進
2024年04月15日
瀏覽(22)
【ElasticSearch系列-06】Es集群架構(gòu)的搭建以及集群的核心概念
ElasticSearch系列整體欄目內(nèi)容鏈接地址【一】ElasticSearch下載和安裝 https://zhenghuisheng.blog.csdn.net/article/details/129260827 【二】ElasticSearch概念和基本操作 https://blog.csdn.net/zhenghuishengq/article/details/134121631 【三】ElasticSearch的高級查詢Query DSL https://blog.csdn.net/zhenghuishengq/article/details/1
2024年02月04日
瀏覽(27)
Elasticsearch數(shù)據(jù)搜索原理
Elasticsearch 是一個開源的、基于 Lucene 的分布式搜索和分析引擎，設(shè)計用于云計算環(huán)境中，能夠?qū)崿F(xiàn)實時的、可擴展的搜索、分析和探索全文和結(jié)構(gòu)化數(shù)據(jù)。它具有高度的可擴展性，可以在短時間內(nèi)搜索和分析大量數(shù)據(jù)。 Elasticsearch 不僅僅是一個全文搜索引擎，它還提供了分布
2024年02月08日
瀏覽(20)
Elasticsearch集群搭建、數(shù)據(jù)分片以及位置坐標實現(xiàn)附近的人搜索
es使用兩種不同的方式來發(fā)現(xiàn)對方：廣播單播也可以同時使用兩者，但默認的廣播，單播需要已知節(jié)點列表來完成當es實例啟動的時候，它發(fā)送了廣播的ping請求到地址 224.2.2.4:54328 。而其他的es實例使用同樣的集群名稱響應(yīng)了這個請求。一般這個默認的集群名稱就是上面的
2024年02月06日
瀏覽(25)