?前言
????????在很多搜索場(chǎng)景中,我們希望能夠搜索出搜索詞相關(guān)的目標(biāo),同時(shí)也希望能搜索出其近義詞相關(guān)的目標(biāo)。例如在商品搜索中,搜索“瓠瓜”,也希望能夠搜索出“西葫蘆”,但“西葫蘆”商品名稱因不含有“瓠瓜”,導(dǎo)致無(wú)法搜索出來(lái)。
????????此時(shí)就需要將“瓠瓜”解析成“瓠瓜”和“西葫蘆”,es的synonym,synonym gragh過(guò)濾器就是提供了該功能,將詞轉(zhuǎn)為近義詞再分詞。
????????如下,聲明了一個(gè)將“瓠瓜”和“西葫蘆”定義為近義詞的分詞器
// 定義自定義分詞
PUT info_goods_v1/_settings
{
"analysis": {
"filter": {
"my_synonyms": {
"type": "synonym_graph",
"synonyms": [
"瓠瓜,西葫蘆"
]
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "ik_max_word",
"filter": [
"lowercase",
"my_synonyms"
]
}
}
}
}
// 使用“瓠瓜”分詞
GET info_goods_v1/_analyze
{
"analyzer": "my_analyzer",
"text": "瓠瓜"
}
// 結(jié)果:
{
"tokens" : [
{
"token" : "西葫蘆",
"start_offset" : 0,
"end_offset" : 2,
"type" : "SYNONYM",
"position" : 0
},
{
"token" : "瓠",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0,
"positionLength" : 2
},
{
"token" : "葫蘆",
"start_offset" : 0,
"end_offset" : 2,
"type" : "SYNONYM",
"position" : 1,
"positionLength" : 2
},
{
"token" : "瓜",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 2
}
]
}
? ? ? ? 可以看到,“瓠瓜” 被分詞成為了“西葫蘆”,“葫蘆”,“瓠”和“瓜”。這是因?yàn)樵谧远ǚ衷~器中,我們將“瓠瓜”和“西葫蘆”定義成了近義詞“瓠瓜=》 瓠瓜,西葫蘆”,相當(dāng)于先將“瓠瓜”轉(zhuǎn)為“瓠瓜”和“西葫蘆”,再依次對(duì)近義詞集合(也就是“瓠瓜”和“西葫蘆”)分詞得到結(jié)果。
? ? ? ? 是不是被“瓠瓜” 和“西葫蘆”弄暈了,不急緩一緩我們接著看...
? ? ? ? 假如近義詞發(fā)生了更新,我們?cè)撊绾胃履??一種方案是關(guān)閉索引,更新索引的分詞器后再打開(kāi);或者可以借助elasticsearch-analysis-dynamic-synonym插件來(lái)動(dòng)態(tài)更新,該插件提供了基于接口和文件的動(dòng)態(tài)更新,但是沒(méi)有提供基于數(shù)據(jù)庫(kù)的。但是不要緊,我們可以稍稍修改一下就能達(dá)到我們的目的,這也是本文的主要內(nèi)容。
????????過(guò)程如下
修改源碼實(shí)現(xiàn)連接數(shù)據(jù)庫(kù)獲取近義詞匯
? ? ? ? 下載elasticsearch-analysis-dynamic-synonym打開(kāi)項(xiàng)目
一、修改pom.xml
? ? ? ? 引入依賴
? ? ? ? <dependency>
? ? ? ? ? ? <groupId>mysql</groupId>
? ? ? ? ? ? <artifactId>mysql-connector-java</artifactId>
? ? ? ? ? ? <version>8.0.21</version>
? ? ? ? </dependency>
? ? ? ? 將版本修改成跟你的es版本號(hào)一樣的,比如我的是7.17.7
<version>7.17.7</version>
二、 修改main/assemblies/plugin.xml
? ? ? ??在<dependencySets>標(biāo)簽下添加
? ? ? ? <dependencySet>
? ? ? ? ? ? <outputDirectory/>
? ? ? ? ? ? <useProjectArtifact>true</useProjectArtifact>
? ? ? ? ? ? <useTransitiveFiltering>true</useTransitiveFiltering>
? ? ? ? ? ? <includes>
? ? ? ? ? ? ? ? <include>mysql:mysql-connector-java</include>
? ? ? ? ? ? </includes>
? ? ? ? </dependencySet>
? ? ? ? 在<assemble>標(biāo)簽下添加
? ? <fileSets>
? ? ? ? <fileSet>
? ? ? ? ? ? <directory>${project.basedir}/config</directory>
? ? ? ? ? ? <outputDirectory>config</outputDirectory>
? ? ? ? </fileSet>
? ? </fileSets>
三、jdbc配置文件
????????在項(xiàng)目根目錄下創(chuàng)建config/jdbc.properties文件,寫入以下內(nèi)容
jdbc.driver=com.mysql.cj.jdbc.Driver
jdbc.url=jdbc:mysql://cckg.liulingjie.cn:3306/test?useUnicode=true&characterEncoding=utf8&autoReconnect=true&useSSL=false&serverTimezone=Asia/Shanghai
jdbc.username=賬號(hào)
jdbc.password=密碼
#近義詞sql查詢語(yǔ)句。(注意要以words字段展示)
synonym.word.sql=SELECT `keys` AS words FROM es_synonym WHERE ifdel = '0'
#獲取近義詞最后更新時(shí)間,用來(lái)判斷是否發(fā)生了更新。(注意要以maxModitime詞匯顯示)
synonym.lastModitime.sql=SELECT MAX(moditime) AS maxModitime FROM es_synonym
interval=10
? 四、編寫加載詞匯類
????????在com.bellszhu.elasticsearch.plugin.synonym.analysis包下,我們可以看到很多加載近義詞匯的類,比如RemoteSynonymFile類就是通過(guò)接口來(lái)加載近義詞詞匯的。
? ? ? ? 我們?cè)谠摪聞?chuàng)建類DynamicSynonymFromDb,同時(shí)繼承SynonymFile接口,該類是用來(lái)讀取數(shù)據(jù)庫(kù)的近義詞匯的,代碼如下:
/**
* @author liulingjie
* @date 2023/4/12 19:43
*/
public class DynamicSynonymFromDb implements SynonymFile {
/**
* 配置文件名
*/
private final static String DB_PROPERTIES = "jdbc.properties";
private static Logger logger = LogManager.getLogger("dynamic-synonym");
private String format;
private boolean expand;
private boolean lenient;
private Analyzer analyzer;
private Environment env;
/**
* 動(dòng)態(tài)配置類型
*/
private String location;
/**
* 作用類型
*/
private String group;
private long lastModified;
private Path conf_dir;
private JdbcConfig jdbcConfig;
DynamicSynonymFromDb(Environment env, Analyzer analyzer,
boolean expand, boolean lenient, String format, String location, String group) {
this.analyzer = analyzer;
this.expand = expand;
this.lenient = lenient;
this.format = format;
this.env = env;
this.location = location;
this.group = group;
// 讀取配置文件
setJdbcConfig();
// 加載驅(qū)動(dòng)
try {
Class.forName(jdbcConfig.getDriver());
} catch (ClassNotFoundException e) {
e.printStackTrace();
}
// 判斷是否需要加載
isNeedReloadSynonymMap();
}
/**
* 讀取配置文件
*/
private void setJdbcConfig() {
// 讀取當(dāng)前 jar 包存放的路徑
Path filePath = PathUtils.get(new File(DynamicSynonymPlugin.class.getProtectionDomain().getCodeSource()
.getLocation().getPath())
.getParent(), "config")
.toAbsolutePath();
this.conf_dir = filePath.resolve(DB_PROPERTIES);
File file = conf_dir.toFile();
Properties properties = null;
try {
properties = new Properties();
properties.load(new FileInputStream(file));
} catch (Exception e) {
logger.error("load jdbc.properties failed");
logger.error(e.getMessage());
}
jdbcConfig = new JdbcConfig(
properties.getProperty("jdbc.driver"),
properties.getProperty("jdbc.url"),
properties.getProperty("jdbc.username"),
properties.getProperty("jdbc.password"),
properties.getProperty("synonym.word.sql"),
properties.getProperty("synonym.lastModitime.sql"),
Integer.valueOf(properties.getProperty("interval"))
);
}
/**
* 加載同義詞詞典至SynonymMap中
* @return SynonymMap
*/
@Override
public SynonymMap reloadSynonymMap() {
try {
logger.info("start reload local synonym from {}.", location);
Reader rulesReader = getReader();
SynonymMap.Builder parser = RemoteSynonymFile.getSynonymParser(rulesReader, format, expand, lenient, analyzer);
return parser.build();
} catch (Exception e) {
logger.error("reload local synonym {} error!", e, location);
throw new IllegalArgumentException(
"could not reload local synonyms file to build synonyms", e);
}
}
/**
* 判斷是否需要進(jìn)行重新加載
* @return true or false
*/
@Override
public boolean isNeedReloadSynonymMap() {
try {
Long lastModify = getLastModify();
if (lastModified < lastModify) {
lastModified = lastModify;
return true;
}
} catch (Exception e) {
logger.error(e);
}
return false;
}
/**
* 獲取同義詞庫(kù)最后一次修改的時(shí)間
* 用于判斷同義詞是否需要進(jìn)行重新加載
*
* @return getLastModify
*/
public Long getLastModify() {
Connection connection = null;
Statement statement = null;
ResultSet resultSet = null;
Long last_modify_long = null;
try {
connection = DriverManager.getConnection(
jdbcConfig.getUrl(),
jdbcConfig.getUsername(),
jdbcConfig.getPassword()
);
statement = connection.createStatement();
resultSet = statement.executeQuery(jdbcConfig.getSynonymLastModitimeSql());
while (resultSet.next()) {
Timestamp last_modify_dt = resultSet.getTimestamp("maxModitime");
last_modify_long = last_modify_dt.getTime();
}
} catch (SQLException e) {
logger.error("獲取同義詞庫(kù)最后一次修改的時(shí)間",e);
} finally {
try {
if (resultSet != null) {
resultSet.close();
}
if (statement != null) {
statement.close();
}
if (connection != null) {
connection.close();
}
} catch (SQLException e) {
e.printStackTrace();
}
}
return last_modify_long;
}
/**
* 查詢數(shù)據(jù)庫(kù)中的同義詞
* @return DBData
*/
public ArrayList<String> getDBData() {
ArrayList<String> arrayList = new ArrayList<>();
Connection connection = null;
Statement statement = null;
ResultSet resultSet = null;
try {
connection = DriverManager.getConnection(
jdbcConfig.getUrl(),
jdbcConfig.getUsername(),
jdbcConfig.getPassword()
);
statement = connection.createStatement();
String sql = jdbcConfig.getSynonymWordSql();
if (group != null && !"".equals(group.trim())) {
sql = String.format("%s AND `key_group` = '%s'", sql, group);
}
resultSet = statement.executeQuery(sql);
while (resultSet.next()) {
String theWord = resultSet.getString("words");
arrayList.add(theWord);
}
} catch (SQLException e) {
logger.error("查詢數(shù)據(jù)庫(kù)中的同義詞異常",e);
} finally {
try {
if (resultSet != null) {
resultSet.close();
}
if (statement != null) {
statement.close();
}
if (connection != null) {
connection.close();
}
} catch (SQLException e) {
e.printStackTrace();
}
}
return arrayList;
}
/**
* 同義詞庫(kù)的加載
* @return Reader
*/
@Override
public Reader getReader() {
StringBuffer sb = new StringBuffer();
try {
ArrayList<String> dbData = getDBData();
for (int i = 0; i < dbData.size(); i++) {
sb.append(dbData.get(i))
.append(System.getProperty("line.separator"));
}
logger.info("load the synonym from db");
} catch (Exception e) {
logger.error("reload synonym from db failed:", e);
}
return new StringReader(sb.toString());
}
}
/**
* 自己創(chuàng)建的配置類
*/
/**
* @author liulingjie
* @date 2022/11/30 16:03
*/
public class JdbcConfig {
public JdbcConfig() {
}
public JdbcConfig(String driver, String url, String username, String password, String synonymWordSql, String synonymLastModitimeSql, Integer interval) {
this.url = url;
this.username = username;
this.password = password;
this.synonymWordSql = synonymWordSql;
this.synonymLastModitimeSql = synonymLastModitimeSql;
this.interval = interval;
this.driver = driver;
}
/**
* 驅(qū)動(dòng)名
*/
private String driver;
/**
* 數(shù)據(jù)庫(kù)url
*/
private String url;
/**
* 數(shù)據(jù)庫(kù)賬號(hào)
*/
private String username;
/**
* 數(shù)據(jù)庫(kù)密碼
*/
private String password;
/**
* 查詢近義詞匯的sql,注意是以words字段展示
*/
private String synonymWordSql;
/**
* 獲取近義詞最近更新時(shí)間的sql
*/
private String synonymLastModitimeSql;
/**
* 間隔,暫時(shí)無(wú)用
*/
private Integer interval;
}
????????然后在DynamicSynonymTokenFilterFactory類的getSynonymFile方法添加如下代碼
?????????注意?group?字段是我自己加的,你們可以刪除或者傳空!??!
?五、打包
????????最后點(diǎn)擊 package 打包
????????在~\target\releases可以看到壓縮包
六、配置放入ES
????????在es安裝路徑\plugins下創(chuàng)建dynamic-synonym文件夾,將上面的壓縮包解壓放入該文件夾
? ? ? ? ?最后重啟es,可以看到以下內(nèi)容
七、嘗試一下? ? ? ? ?
????????然后,我們使用該過(guò)濾器類型。參考語(yǔ)句如下
POST info_goods/_close
PUT info_goods/_settings
{
"analysis": {
"filter": {
"my_synonyms": {
"type": "dynamic_synonym",
"synonyms_path": "fromDB",
"interval": 30 // 刷新間隔(秒)
}
},
"analyzer": {
"my_analyzer": {
"type": "custom",
"tokenizer": "ik_max_word",
"filter": [
"lowercase",
"my_synonyms"
]
}
}
}
}
POST info_goods/_open
? ?????????淺淺試一下
# 解析“瓠瓜”
GET info_goods/_analyze
{
"analyzer": "my_analyzer",
"text": "瓠瓜"
}
# 結(jié)果
{
"tokens" : [
{
"token" : "西葫蘆",
"start_offset" : 0,
"end_offset" : 2,
"type" : "SYNONYM",
"position" : 0
},
{
"token" : "瓠",
"start_offset" : 0,
"end_offset" : 1,
"type" : "CN_CHAR",
"position" : 0,
"positionLength" : 2
},
{
"token" : "葫蘆",
"start_offset" : 0,
"end_offset" : 2,
"type" : "SYNONYM",
"position" : 1,
"positionLength" : 2
},
{
"token" : "瓜",
"start_offset" : 1,
"end_offset" : 2,
"type" : "CN_CHAR",
"position" : 2
}
]
}
? ? ? ? ?
????????有效果了!大功搞成!嘿嘿^_^
? ? ? ? 知道你們懶,源碼和最終插件包已上傳,你們看需下載吧^_^
報(bào)錯(cuò)處理
????????如果出現(xiàn)以下錯(cuò)誤:
java.security.AccessControlException: access denied (java.net.SocketPermission 127.0.0.1:3306?connect,resolve)
????????則創(chuàng)建一個(gè)策略文件socketPolicy.policy:
grant {
permission java.net.SocketPermission "cckg.liulingjie.cn:3306","connect,resolve";
permission java.net.SocketPermission "localhost:3306","connect,resolve";
};
????????修改elasticsearch-7.17.7\config\jvm.options配置文件,指定socketPolicy.policy文件路徑
-Djava.security.policy=D:\ProgramFiles\elasticsearch-7.17.7\plugins\ik\config\socketPolicy.policy
? ? ? ? 重啟es就OK了文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-490209.html
? ? ? ? 如果是安裝在windows服務(wù)的,記得執(zhí)行以下命令重新注冊(cè)服務(wù)文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-490209.html
elasticsearch-service.bat remove
elasticsearch-service.bat install
到了這里,關(guān)于?ES elasticsearch-analysis-dynamic-synonym?連接數(shù)據(jù)庫(kù)動(dòng)態(tài)更新synonym近義詞的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!