国产 无码 综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

使用ES對一段中文進行分詞

這篇具有很好參考價值的文章主要介紹了使用ES對一段中文進行分詞。希望對大家有所幫助。如果存在錯誤或未考慮完全的地方,請大家不吝賜教,您也可以點擊"舉報違法"按鈕提交疑問。

ES連接使用org.elasticsearch.client.RestHighLevelClient。獲取分詞的代碼如下:


import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;
import org.apache.http.util.EntityUtils;
import org.elasticsearch.client.Request;
import org.elasticsearch.client.Response;
import org.elasticsearch.client.RestHighLevelClient;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;

import java.util.ArrayList;
import java.util.List;

@Service
public class BaseDataService {
    protected Logger logger = LoggerFactory.getLogger(this.getClass());

    @Autowired
    private RestHighLevelClient restHighLevelClient;

    /**
     * 獲取分詞
     *
     * @param text
     * @return
     * @throws Exception
     */
    public List<String> getAnalyze(String text) throws Exception {
        List<String> list = new ArrayList<String>();
        Request request = new Request("GET", "_analyze");
        JSONObject entity = new JSONObject();
        entity.put("analyzer", "ik_max_word");
        entity.put("text", text);
        request.setJsonEntity(entity.toJSONString());
        Response response = restHighLevelClient.getLowLevelClient().performRequest(request);
        JSONObject tokens = JSONObject.parseObject(EntityUtils.toString(response.getEntity()));
        JSONArray arrays = tokens.getJSONArray("tokens");
        for (int i = 0; i < arrays.size(); i++) {
            JSONObject obj = JSON.parseObject(arrays.getString(i));
            list.add(obj.getString("token"));
        }
        return list;
    }

}

單測代碼如下:

 @Test
    public void getAnalyze() throws Exception {
        String text = "點擊上方藍字關(guān)注我們!全體教職員工、家長朋友們:你們好!快樂而充實的暑期生活即將結(jié)束,新學(xué)期的各項工作即將開啟。鑒于目前國內(nèi)、省內(nèi)嚴峻復(fù)雜的疫情形勢,為進一步做好幼兒園疫情防控工作,為秋季開學(xué)創(chuàng)造良好條件,確保返園后正常的教育教學(xué)秩序,現(xiàn)溫馨提示如下:一、做好返安準備。廣大教職員工及幼兒根據(jù)開學(xué)時間以及疫情形勢變化,預(yù)留足夠時間,至少提前7天返安或返回居住地(即:全體教師于2022年8月20日零時前返安;全體幼兒于2022年8月24日零時前返安),并嚴格落實屬地(單位報備、社區(qū)報備)健康管理要求。二、做好健康監(jiān)測。建議從外地返安的教職工、幼兒及家長自覺進行3天2次核酸檢測(至少間隔24小時),并做好7天自我健康監(jiān)測。前3天原則上“兩點一線”,少聚集、少聚會。時刻關(guān)注自己和家人的身體狀況,如出現(xiàn)發(fā)熱、干咳、乏力、嗅(味)覺減退、鼻塞、流涕、咽痛、結(jié)膜炎、肌痛和腹瀉等癥狀,及時到附近的發(fā)熱門診進行排查和診療,就醫(yī)過程盡量避免乘坐公共交通工具。三、做好重點防控。近7日內(nèi)有中、高風(fēng)險區(qū)旅居或與相關(guān)人員有密切接觸的教師、幼兒,返安前 48 小時向目的地社區(qū)報備,在抵安后12小時內(nèi)向目的地社區(qū)和幼兒園報告,并配合做好信息登記、核酸檢測、集中隔離或居家健康監(jiān)測等管控措施。四、做好健康登記。如實填寫《漢濱區(qū)鐵路幼兒園疫情防控返園承諾書及返園前健康監(jiān)測登記表》,并在開學(xué)當天上交紙質(zhì)版給班級教師。(電子表格已發(fā)至班級群)新學(xué)期開學(xué)在即,讓我們一起做好返園前各項防控工作,確保全體教職工及幼兒安全返園。祝大家身體健康!暑假愉快!漢濱區(qū)鐵路幼兒園2022年8月19日掃碼關(guān)注分享給第一個想到的人";
        List<String> result = baseDataService.getAnalyze(text);
        System.out.println(JsonMapper.toJson(result));
    }

執(zhí)行結(jié)果:

["點擊","上方","藍字","關(guān)注","我們","全體","教職員工","教職員","教職","職員","員工","家長","朋友們","朋友","們","你們","好","快樂","而","充實","的","暑期","生活","即將","結(jié)束","新學(xué)期","新學(xué)","學(xué)期","的","各項工作","各項","工作","即將","開啟","鑒于","目前國內(nèi)","目前","國內(nèi)","省內(nèi)","嚴峻","復(fù)雜","的","疫情","情形","形勢","為","進一步","進一","一步","一","步","做好","幼兒園","幼兒","園","疫情","防","控","工作","為","秋季","開學(xué)","創(chuàng)造","良好條件","良好","條件","確保","返","園","后","正常","的","教育","教學(xué)秩序","教學(xué)","秩序","現(xiàn)","溫馨","提示","如下","一","做好","返","安","準備","廣大","教職員工","教職員","教職","職員","員工","及","幼兒","根據(jù)","開學(xué)","學(xué)時","時間","以及","疫情","情形","形勢","變化","預(yù)留","留足","足夠","時間","至少","少提","提前","7","天","返","安","或","返回","居住地","居住","住地","即","全體","教師","于","2022","年","8","月","20","日","零時","零","時","前","返","安","全體","幼兒","于","2022","年","8","月","24","日","零時","零","時","前","返","安","并","嚴格","落實","實屬","屬地","單位","報備","社區(qū)","報備","健康","管理","要求","二","做好","健康","監(jiān)測","建議","從","外地","返","安","的","教職工","教職","職工","幼兒","及","家長","自覺","進行","3","天","2","次","核酸","檢測","至少","少間","間隔","24","小時","時","并","做好","7","天","自我","健康","監(jiān)測","前","3","天","原則上","原則","上","兩點","兩","點","一線","一","線","少","聚集","少","聚會","時刻","關(guān)注","自己","和家人","家人","的","身體狀況","身體","狀況","如","出現(xiàn)","發(fā)熱","干咳","乏力","嗅","味","覺","減退","鼻塞","流涕","咽","痛","結(jié)膜炎","結(jié)膜","膜炎","肌","痛","和","腹瀉","等","癥狀","及時","到","附近","的","發(fā)熱","熱門","門診","進行","排查","和","診療","就醫(yī)","過程","盡量","避免","乘坐","公共交通","公共","交通工具","交通","工具","三","做好","重點","防","控","近","7","日內(nèi)","日","內(nèi)有","中","高風(fēng)險","高風(fēng)","風(fēng)險","險區(qū)","旅居","或與","相關(guān)","關(guān)人","人員","有","密切接觸","密切","接觸","的","教師","幼兒","返","安","前","48","小時","向","目的地","目的","地","社區(qū)","報備","在","抵","安","后","12","小時內(nèi)","小時","時","內(nèi)向","目的地","目的","地","社區(qū)","和","幼兒園","幼兒","園","報告","并","配合","合做","做好","信息","登記","核酸","檢測","集中","中隔","隔離","或","居家","健康","監(jiān)測","等","管","控","措施","四","做好","健康","登記","如實","填寫","漢濱區(qū)","鐵路","幼兒園","幼兒","園","疫情","防","控","返","園","承諾書","承諾","書","及","返","園","前","健康","監(jiān)測","登記表","登記","表","并在","開學(xué)","當天","天上","上交","紙質(zhì)","版","給","班級","教師","電子表格","電子表","電子","子表","表格","已","發(fā)至","班級","群","新學(xué)期","新學(xué)","學(xué)期","開學(xué)","在即","讓我們","我們","一起","一","起","做好","返","園","前","各項","防","控","工作","確保全","確保","保全","全體","教職工","教職","職工","及","幼兒","安全","返","園","祝","大家","身體健康","身體","健康","暑假","愉快","漢濱區(qū)","鐵路","幼兒園","幼兒","園","2022","年","8","月","19","日","掃","碼","關(guān)注","分享","給","第一個","第一","一個","一","個","想到","的人"]

resthighlevelclient 分詞,elasticsearch,分詞文章來源地址http://www.zghlxwxcb.cn/news/detail-521642.html

到了這里,關(guān)于使用ES對一段中文進行分詞的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!

本文來自互聯(lián)網(wǎng)用戶投稿,該文觀點僅代表作者本人,不代表本站立場。本站僅提供信息存儲空間服務(wù),不擁有所有權(quán),不承擔相關(guān)法律責(zé)任。如若轉(zhuǎn)載,請注明出處: 如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實不符,請點擊違法舉報進行投訴反饋,一經(jīng)查實,立即刪除!

領(lǐng)支付寶紅包贊助服務(wù)器費用

相關(guān)文章

  • es elasticsearch 十 中文分詞器ik分詞器 Mysql 熱更新詞庫

    es elasticsearch 十 中文分詞器ik分詞器 Mysql 熱更新詞庫

    目錄 中文分詞器ik分詞器 介紹 安裝 使用分詞器 Ik分詞器配置文件 Mysql 熱更新詞庫 介紹 中文分詞器按照中文進行分詞,中文應(yīng)用最廣泛的是ik分詞器 安裝 官網(wǎng)下載對應(yīng)版本zip 下載 ?放到 ?plugins 目錄 新建 ik文件夾 考入解析zip 重啟 es //分成小單詞 使用分詞器 ik_max_word分成

    2024年02月07日
    瀏覽(21)
  • Elasticsearch07:ES中文分詞插件(es-ik)安裝部署

    Elasticsearch07:ES中文分詞插件(es-ik)安裝部署

    在中文數(shù)據(jù)檢索場景中,為了提供更好的檢索效果,需要在ES中集成中文分詞器,因為ES默認是按照英文的分詞規(guī)則進行分詞的,基本上可以認為是單字分詞,對中文分詞效果不理想。 ES之前是沒有提供中文分詞器的,現(xiàn)在官方也提供了一些,但是在中文分詞領(lǐng)域,IK分詞器是

    2024年02月03日
    瀏覽(28)
  • es自定義分詞器支持數(shù)字字母分詞,中文分詞器jieba支持添加禁用詞和擴展詞典

    自定義分析器,分詞器 所有字段檢索 高亮搜索 分詞測試 GET /test_index/_analyze jieba中文分詞支持添加禁用詞和擴展詞庫功能 創(chuàng)建索引:PUT http://xxxx:9200/test_index 分詞測試: GET http://xxxxxx:9200/test_index/_analyze

    2024年02月11日
    瀏覽(23)
  • ES自定義分詞,對數(shù)字進行分詞

    需求:需要將下面類似的數(shù)據(jù)分詞為:GB,T,32403,1,2015 我們使用的Unicode進行正則匹配,Unicode將字符編碼分為了七類,其中 P代表標點 L 代表字母 Z 代表分隔符(空格,換行) S 代表數(shù)學(xué)符號,貨幣符號 M代表標記符號 N 阿拉伯數(shù)字,羅馬數(shù)字 C其他字符 例如:所以pP的作用是匹配

    2024年02月15日
    瀏覽(21)
  • ES客戶端RestHighLevelClient的使用

    ES客戶端RestHighLevelClient的使用

    默認情況下,ElasticSearch使用兩個端口來監(jiān)聽外部TCP流量。 9200端口:用于所有通過HTTP協(xié)議進行的API調(diào)用。包括搜索、聚合、監(jiān)控、以及其他任何使用HTTP協(xié)議的請求。所有的客戶端庫都會使用該端口與ElasticSearch進行交互。 9300端口:是一個自定義的二進制協(xié)議,用于集群中各

    2024年02月03日
    瀏覽(20)
  • 項目中使用es(二):使用RestHighLevelClient操作elasticsearch

    寫在前面 之前寫了有關(guān)elasticsearch的搭建和使用springboot操作elasticsearch,這次主要簡單說下使用RestHighLevelClient工具包操作es。 搭建環(huán)境和選擇合適的版本 環(huán)境還是以springboot2.7.12為基礎(chǔ)搭建的,不過這不重要,因為這次想說的是RestHighLevelClient操作elasticsearch,RestHighLevelClient版本

    2024年02月14日
    瀏覽(26)
  • Java使用Springboot集成Es官方推薦(RestHighLevelClient)

    Java使用Springboot集成Es官方推薦(RestHighLevelClient)

    SpringBoot集成ElasticSearch的四種方式(主要講解ES官方推薦方式) TransportClient:這種方式即將棄用 官方將在8.0版本徹底去除 Data-Es:Spring提供的封裝的方式,由于是Spring提供的,所以每個SpringBoot版本對應(yīng)的ElasticSearch,具體這么個對應(yīng)的版本,自己去官網(wǎng)看 ElasticSearch SQL:將Elasti

    2023年04月08日
    瀏覽(22)
  • Springboot 整合 Elasticsearch(五):使用RestHighLevelClient操作ES ②

    Springboot 整合 Elasticsearch(五):使用RestHighLevelClient操作ES ②

    ?? 前情提要: Springboot 整合 Elasticsearch(三):使用RestHighLevelClient操作ES ① 目錄 ?一、Springboot 整合 Elasticsearch 1、RestHighLevelClient API介紹 1.1、全查詢 分頁 排序 1.2、單條件查詢 1.2.1、termQuery 1.2.2、matchQuery 1.2.3、短語檢索 1.3、組合查詢 1.4、范圍查詢 1.5、模糊查詢 1.6、分組

    2024年04月11日
    瀏覽(28)
  • Elasticsearch7.15.2 安裝ik中文分詞器后啟動ES服務(wù)報錯的解決辦法

    Elasticsearch7.15.2 安裝ik中文分詞器后啟動ES服務(wù)報錯的解決辦法

    下載elasticsearch ik中文分詞器,在elasticsearch安裝目錄下的plugins文件夾下創(chuàng)建名為ik的文件夾,將下載的ik中文分詞器解壓縮到新建的ik文件夾下,再次運行 ./bin/elasticsearch啟動ES服務(wù)時出現(xiàn)以下錯誤: Exception in thread \\\"main\\\" java.nio.file.NotDirectoryException: /Users/amelia/work/elasticsearch-7.1

    2024年02月12日
    瀏覽(34)
  • 本地elasticsearch中文分詞器 ik分詞器安裝及使用

    本地elasticsearch中文分詞器 ik分詞器安裝及使用

    ElasticSearch 內(nèi)置了分詞器,如標準分詞器、簡單分詞器、空白詞器等。但這些分詞器對我們最常使用的中文并不友好,不能按我們的語言習(xí)慣進行分詞。 ik分詞器就是一個標準的中文分詞器。它可以根據(jù)定義的字典對域進行分詞,并且支持用戶配置自己的字典,所以它除了可

    2024年02月05日
    瀏覽(36)

覺得文章有用就打賞一下文章作者

支付寶掃一掃打賞

博客贊助

微信掃一掃打賞

請作者喝杯咖啡吧~博客贊助

支付寶掃一掃領(lǐng)取紅包,優(yōu)惠每天領(lǐng)

二維碼1

領(lǐng)取紅包

二維碼2

領(lǐng)紅包