Elasticsearch中提供了一個(gè)叫N-gram tokenizer的分詞器,官方介紹如下
N-gram tokenizer
The?ngram
?tokenizer first breaks text down into words whenever it encounters one of a list of specified characters, then it emits?N-grams?of each word of the specified length.
N-grams are like a sliding window that moves across the word - a continuous sequence of characters of the specified length. They are useful for querying languages that don’t use spaces or that have long compound words, like German.
Example output
With the default settings, the?ngram
?tokenizer treats the initial text as a single token and produces N-grams with minimum length?1
?and maximum length?2
:文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-414980.html
POST _analyze
{
"tokenizer": "ngram",
"text": "Quick Fox"
}
The above sentence 文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-414980.html
到了這里,關(guān)于Elasticsearch對(duì)數(shù)字,英文字母等的分詞N-gram tokenizer的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!