ES連接使用org.elasticsearch.client.RestHighLevelClient。獲取分詞的代碼如下:
import com.alibaba.fastjson.JSON;
import com.alibaba.fastjson.JSONArray;
import com.alibaba.fastjson.JSONObject;
import org.apache.http.util.EntityUtils;
import org.elasticsearch.client.Request;
import org.elasticsearch.client.Response;
import org.elasticsearch.client.RestHighLevelClient;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;
import org.springframework.beans.factory.annotation.Autowired;
import org.springframework.stereotype.Service;
import java.util.ArrayList;
import java.util.List;
@Service
public class BaseDataService {
protected Logger logger = LoggerFactory.getLogger(this.getClass());
@Autowired
private RestHighLevelClient restHighLevelClient;
/**
* 獲取分詞
*
* @param text
* @return
* @throws Exception
*/
public List<String> getAnalyze(String text) throws Exception {
List<String> list = new ArrayList<String>();
Request request = new Request("GET", "_analyze");
JSONObject entity = new JSONObject();
entity.put("analyzer", "ik_max_word");
entity.put("text", text);
request.setJsonEntity(entity.toJSONString());
Response response = restHighLevelClient.getLowLevelClient().performRequest(request);
JSONObject tokens = JSONObject.parseObject(EntityUtils.toString(response.getEntity()));
JSONArray arrays = tokens.getJSONArray("tokens");
for (int i = 0; i < arrays.size(); i++) {
JSONObject obj = JSON.parseObject(arrays.getString(i));
list.add(obj.getString("token"));
}
return list;
}
}
單測代碼如下:
@Test
public void getAnalyze() throws Exception {
String text = "點擊上方藍字關(guān)注我們!全體教職員工、家長朋友們:你們好!快樂而充實的暑期生活即將結(jié)束,新學(xué)期的各項工作即將開啟。鑒于目前國內(nèi)、省內(nèi)嚴峻復(fù)雜的疫情形勢,為進一步做好幼兒園疫情防控工作,為秋季開學(xué)創(chuàng)造良好條件,確保返園后正常的教育教學(xué)秩序,現(xiàn)溫馨提示如下:一、做好返安準備。廣大教職員工及幼兒根據(jù)開學(xué)時間以及疫情形勢變化,預(yù)留足夠時間,至少提前7天返安或返回居住地(即:全體教師于2022年8月20日零時前返安;全體幼兒于2022年8月24日零時前返安),并嚴格落實屬地(單位報備、社區(qū)報備)健康管理要求。二、做好健康監(jiān)測。建議從外地返安的教職工、幼兒及家長自覺進行3天2次核酸檢測(至少間隔24小時),并做好7天自我健康監(jiān)測。前3天原則上“兩點一線”,少聚集、少聚會。時刻關(guān)注自己和家人的身體狀況,如出現(xiàn)發(fā)熱、干咳、乏力、嗅(味)覺減退、鼻塞、流涕、咽痛、結(jié)膜炎、肌痛和腹瀉等癥狀,及時到附近的發(fā)熱門診進行排查和診療,就醫(yī)過程盡量避免乘坐公共交通工具。三、做好重點防控。近7日內(nèi)有中、高風(fēng)險區(qū)旅居或與相關(guān)人員有密切接觸的教師、幼兒,返安前 48 小時向目的地社區(qū)報備,在抵安后12小時內(nèi)向目的地社區(qū)和幼兒園報告,并配合做好信息登記、核酸檢測、集中隔離或居家健康監(jiān)測等管控措施。四、做好健康登記。如實填寫《漢濱區(qū)鐵路幼兒園疫情防控返園承諾書及返園前健康監(jiān)測登記表》,并在開學(xué)當天上交紙質(zhì)版給班級教師。(電子表格已發(fā)至班級群)新學(xué)期開學(xué)在即,讓我們一起做好返園前各項防控工作,確保全體教職工及幼兒安全返園。祝大家身體健康!暑假愉快!漢濱區(qū)鐵路幼兒園2022年8月19日掃碼關(guān)注分享給第一個想到的人";
List<String> result = baseDataService.getAnalyze(text);
System.out.println(JsonMapper.toJson(result));
}
執(zhí)行結(jié)果:
["點擊","上方","藍字","關(guān)注","我們","全體","教職員工","教職員","教職","職員","員工","家長","朋友們","朋友","們","你們","好","快樂","而","充實","的","暑期","生活","即將","結(jié)束","新學(xué)期","新學(xué)","學(xué)期","的","各項工作","各項","工作","即將","開啟","鑒于","目前國內(nèi)","目前","國內(nèi)","省內(nèi)","嚴峻","復(fù)雜","的","疫情","情形","形勢","為","進一步","進一","一步","一","步","做好","幼兒園","幼兒","園","疫情","防","控","工作","為","秋季","開學(xué)","創(chuàng)造","良好條件","良好","條件","確保","返","園","后","正常","的","教育","教學(xué)秩序","教學(xué)","秩序","現(xiàn)","溫馨","提示","如下","一","做好","返","安","準備","廣大","教職員工","教職員","教職","職員","員工","及","幼兒","根據(jù)","開學(xué)","學(xué)時","時間","以及","疫情","情形","形勢","變化","預(yù)留","留足","足夠","時間","至少","少提","提前","7","天","返","安","或","返回","居住地","居住","住地","即","全體","教師","于","2022","年","8","月","20","日","零時","零","時","前","返","安","全體","幼兒","于","2022","年","8","月","24","日","零時","零","時","前","返","安","并","嚴格","落實","實屬","屬地","單位","報備","社區(qū)","報備","健康","管理","要求","二","做好","健康","監(jiān)測","建議","從","外地","返","安","的","教職工","教職","職工","幼兒","及","家長","自覺","進行","3","天","2","次","核酸","檢測","至少","少間","間隔","24","小時","時","并","做好","7","天","自我","健康","監(jiān)測","前","3","天","原則上","原則","上","兩點","兩","點","一線","一","線","少","聚集","少","聚會","時刻","關(guān)注","自己","和家人","家人","的","身體狀況","身體","狀況","如","出現(xiàn)","發(fā)熱","干咳","乏力","嗅","味","覺","減退","鼻塞","流涕","咽","痛","結(jié)膜炎","結(jié)膜","膜炎","肌","痛","和","腹瀉","等","癥狀","及時","到","附近","的","發(fā)熱","熱門","門診","進行","排查","和","診療","就醫(yī)","過程","盡量","避免","乘坐","公共交通","公共","交通工具","交通","工具","三","做好","重點","防","控","近","7","日內(nèi)","日","內(nèi)有","中","高風(fēng)險","高風(fēng)","風(fēng)險","險區(qū)","旅居","或與","相關(guān)","關(guān)人","人員","有","密切接觸","密切","接觸","的","教師","幼兒","返","安","前","48","小時","向","目的地","目的","地","社區(qū)","報備","在","抵","安","后","12","小時內(nèi)","小時","時","內(nèi)向","目的地","目的","地","社區(qū)","和","幼兒園","幼兒","園","報告","并","配合","合做","做好","信息","登記","核酸","檢測","集中","中隔","隔離","或","居家","健康","監(jiān)測","等","管","控","措施","四","做好","健康","登記","如實","填寫","漢濱區(qū)","鐵路","幼兒園","幼兒","園","疫情","防","控","返","園","承諾書","承諾","書","及","返","園","前","健康","監(jiān)測","登記表","登記","表","并在","開學(xué)","當天","天上","上交","紙質(zhì)","版","給","班級","教師","電子表格","電子表","電子","子表","表格","已","發(fā)至","班級","群","新學(xué)期","新學(xué)","學(xué)期","開學(xué)","在即","讓我們","我們","一起","一","起","做好","返","園","前","各項","防","控","工作","確保全","確保","保全","全體","教職工","教職","職工","及","幼兒","安全","返","園","祝","大家","身體健康","身體","健康","暑假","愉快","漢濱區(qū)","鐵路","幼兒園","幼兒","園","2022","年","8","月","19","日","掃","碼","關(guān)注","分享","給","第一個","第一","一個","一","個","想到","的人"]文章來源:http://www.zghlxwxcb.cn/news/detail-521642.html
文章來源地址http://www.zghlxwxcb.cn/news/detail-521642.html
到了這里,關(guān)于使用ES對一段中文進行分詞的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!