本文參考自:https://blog.csdn.net/Q54665642ljf/article/details/127701719
本文適用于elasticsearch
入門小白,還請大佬能指出我的不足(本人其實也是剛學(xué)elasticsearch
沒多久)
一、準(zhǔn)備工作
1.1 安裝ES文本抽取插件
(1)為什么要有文本抽取插件?
對于word
、pdf
等文檔類型的文件而言,它們文件底層的內(nèi)容除了純文本之外,還會有很多雜亂的信息(比如在一個word
文件中,除了文本內(nèi)容,還包含了頁面設(shè)置、字體大小、顏色等無關(guān)信息)
為了剔除文檔中與文本無關(guān)的信息,所以才需要使用文本抽取插件。
(2)如何安裝文本抽取插件?
在 elasticsearch
的bin
目錄下,使用elasticsearch-plugin
來安裝文本抽取插件ingest-attachment
。
# windows下命令(進到bin目錄):
elasticsearch-plugin install ingest-attachment
# Linux下命令(進到bin目錄):
./elasticsearch-plugin install ingest-attachment
為了方便后續(xù)檢索文本,需要安裝一個IK分詞器插件(官方下載地址:https://github.com/medcl/elasticsearch-analysis-ik
官方里面也有說明如何進行下載。選擇一個和你elasticsearch
版本相同的版本進行下載即可,比如執(zhí)行以下命令:
# windows下(進到bin目錄):
elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.3.0/elasticsearch-analysis-ik-7.6.2.zip
# Linux下(進到bin目錄)
./elasticsearch-plugin install https://github.com/medcl/elasticsearch-analysis-ik/releases/download/v6.3.0/elasticsearch-analysis-ik-7.6.2.zip
命令執(zhí)行完畢,在plugins
目錄下可以看到相關(guān)插件已安裝。
1.2 定義文本抽取管道(pipeline)
(1)什么是管道(pipeline)?
pipeline
也叫做“預(yù)處理管道”,它主要的作用是可以在存儲內(nèi)容時,對字段進行加工。
比如,有一串奇怪的字符串&*he@¥#ll%&o……¥%
,在不進行加工的情況下,我直接丟給用戶看,那么用戶看到的就是&*he@¥#ll%&o……¥%
,無法看見其中的關(guān)鍵信息。
但是,假如我有這么一個字符串加工機器,我把&*he@¥#ll%&o……¥%
丟進去,結(jié)果出來的是hello
,這時候的數(shù)據(jù)才是用戶真正想要的。
pipeline
就相當(dāng)于這里的“加工機器”,它起到的是一個加工數(shù)據(jù)的作用。
(2)定義文本抽取管道
我們需要在kibana
控制臺中,創(chuàng)建一個名為"attachment"
的預(yù)定義管道。
Kibana 是一款免費且開放的前端應(yīng)用程序, 可以為 Elasticsearch 中索引的數(shù)據(jù)提供搜索和數(shù)據(jù)可視化功能。
(此處不提供 Kibana 安裝教程)
在"attachment"
中指定要過濾的字段為content
,所以寫?elasticsearch
時需要將?檔內(nèi)容放在content
字段。
PUT /_ingest/pipeline/attachment
{
"description": "提取附件信息",
"processors": [{
"attachment": {
"field": "content",
"ignore_missing": true
}
},
{
"remove": {
"field": "content"
}
}
]
}
注意!??!
定義好管道之后,我們只需把文檔文件轉(zhuǎn)化為Base64
格式,并把它丟到content
字段上,文本抽取管道會自動幫我們把文件內(nèi)容進行加工,把經(jīng)過IK分詞器分詞后的純文本結(jié)果存儲到content
字段上!
接下來,我們可以開始創(chuàng)建索引,并在索引中定義這個content
字段了。
1.3 創(chuàng)建索引
(1)創(chuàng)建的索引結(jié)構(gòu)
-
id
:標(biāo)識唯一記錄 -
userId
:文件所屬用戶id,根據(jù)需求添加。 -
docId
:文件id,根據(jù)需求添加。 -
docName
:文件名稱,使用了ik_max_word
中文分詞器(把中?盡可能的拆分) -
docType
:文件類型,根據(jù)需求添加。 -
content
:關(guān)鍵??! 用于存儲文件的base64
內(nèi)容,使用了ik_smart
中文分詞器(按常?習(xí)慣劃分)
PUT /docwrite
{
"mappings": {
"properties": {
"id": {
"type": "keyword"
},
"userId": {
"type": "keyword"
},
"docId":{
"type": "keyword"
},
"docName": {
"type": "text",
"analyzer": "ik_max_word"
},
"docType": {
"type": "keyword"
},
"attachment": {
"properties": {
"content": {
"type": "text",
"analyzer": "ik_smart"
}
}
}
}
}
}
二、在 Kibana 中測試添加文檔
2.1 先把文件轉(zhuǎn)為Base64形式
找一個Base64在線轉(zhuǎn)換網(wǎng)站,把某個文檔文件轉(zhuǎn)換成base64
字符串。
或者可以直接用我下面給出的Base64
內(nèi)容。
這是我自己創(chuàng)建的一個word
文檔,里面的內(nèi)容是從elasticsearch
官網(wǎng)里抄來的。把這個文檔轉(zhuǎn)化為Base64
后的結(jié)果是:
UEsDBBQABgAIAAAAIQDfpNJsWgEAACAFAAATAAgCW0NvbnRlbnRfVHlwZXNdLnhtbCCiBAIooAACAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAC0lMtuwjAQRfeV+g+Rt1Vi6KKqKgKLPpYtUukHGHsCVv2Sx7z+vhMCUVUBkQpsIiUz994zVsaD0dqabAkRtXcl6xc9loGTXmk3K9nX5C1/ZBkm4ZQw3kHJNoBsNLy9GUw2ATAjtcOSzVMKT5yjnIMVWPgAjiqVj1Ykeo0zHoT8FjPg973eA5feJXApT7UHGw5eoBILk7LXNX1uSCIYZNlz01hnlUyEYLQUiep86dSflHyXUJBy24NzHfCOGhg/mFBXjgfsdB90NFEryMYipndhqYuvfFRcebmwpCxO2xzg9FWlJbT62i1ELwGRztyaoq1Yod2e/ygHpo0BvDxF49sdDymR4BoAO+dOhBVMP69G8cu8E6Si3ImYGrg8RmvdCZFoA6F59s/m2NqciqTOcfQBaaPjP8ber2ytzmngADHp039dm0jWZ88H9W2gQB3I5tv7bfgDAAD//wMAUEsDBBQABgAIAAAAIQAekRq37wAAAE4CAAALAAgCX3JlbHMvLnJlbHMgogQCKKAAAgrJLBasMwDEDvg/2D0b1R2sEYo04vY9DbGNkHCFtJTBPb2GrX/v082NgCXelhR8vS05PQenOcRnXglF3wGpZVDYq9Cdb5XsNb+7x4AJWFvKUxeNZw4gyb5vZm/cojSSnKg4tZFYrPGgaR+IiYzcAT5SpE9uWnC2kiKc/UYySzo55xVdf3mH4zoJkx1dZqSFt7B6o9Rb6GHbrOGX4KZj+xlzMtkI/C3rJdxFTqk7gyjWop9SwabDAvJZyRYqwKGvC80ep6o7+nxYmFLAmhCYkv+3xmXBJa/ueK5hk/Nu8hWbRf4W8bnF1B8wEAAP//AwBQSwMEFAAGAAgAAAAhAPd6kjMuCQAAGyQAABEAAAB3b3JkL2RvY3VtZW50LnhtbORaWVPbWBZ+n6r5Dy6/J7Zsea2GLvDSlanuHnroeaaELGxN25JLEhDmyQnYYQmYniYJDU6gISzTHZZMFoyX8F/SupJ5yl+Yc3XlJTakZafJMNUpyouk891zz/nOd8+9zmef304lbROcJPOi0GenbjrtNk5gxRgvxPvsf/82esNvt8kKI8SYpChwffYpTrZ/3v/nP302GYyJ7HiKExQbQAhycDLN9tkTipIOOhwym+BSjHwzxbOSKItjyk1WTDnEsTGe5RyTohRzuJyU0/iUlkSWk2UYL8QIE4xsN+HY29bQYhIzCcYYkHawCUZSuNtNDKprEI8j4PB3Arl6AIIZuqhOKHfXUF4H9qoDiO4JCLzqQPL0hnTB5Ly9Ibk6kXy9Ibk7kfy9IXXQKdVJcDHNCXBzTJRSjAJfpbgjxUjfjadvAHCaUfhRPskrU4Dp9NZhGF74rgePwKqBkHLHukbwOVJijEu6Y3UUsc8+LglB0/5Gwx67HiT25lvdQrIyf2ISNsXBmLlD4pIQC1GQE3y6UeGpXtHgZqIOMvGhSUykkvXnJtOUxXK5TJ7CJJRNQCvum/FPJYnnH0aknBYygiEaFlZceH/MuicpYGFz4J5C0xJcyqKA1AFcHQBelrMo+HUMv4nhYJsVinF4i6VRxyFZwTh8M7CURR1rd6YFQI4psURXKK56XB3YllGYBCM3iI4Rue6c8jTgplItMUrHP64QvpDE8XQTjf84tFtNWZvEDUYXWGZBtRa5/HHODCeYNKhdig3eiguixIwmwSMoDxsw3GZkAL8CUfCb8ZG7bVzHubZhjbH3Q2c0Ksam8Hsa7tHBNCMxt4CUbift9oZ90FDhq7CuKPiqz/wHV4PQhcX+1md3OiM+bzg02Lg0JF1wMcyNMeNJpfPOUMslw4shCb9J5G3UYbyGZHh3mFcdzYcuHLLFXIqKgiLDUwlegLE5RlYGZJ6xE9QLsCeDSr+2XNBfbr3NrMCfVsigjZJWeKavzaCTHZQ9wY8qxIC40hk6lyc06PEN9BA6y1FqjdEHJtk6LX3pGBX231Xua7Pfq+Vn6OQ/ejmPDnbVyppayqHdu2r1gVo901f235ui+WL0zUE5zbDAsbTEyZw0wdn7bZEkDMezMsdIbMKmVgtqsaQWM2rxZ/0wgxYfQNxsX4vD33xpO99e0jYrOI6zOVScRpW89vCe9tM2OlhFd/f1F2W9vPFr5q7pXf5ILe+g4xzBsv1l+K9f28jz2twKmj1uG/j83n2YmT7zGi3DKKfa4qHtVtimbWcgkwCqVnNtBnphQzvYRpUSerqnni6gpzMQHLW0RAZWy0vao83a3gx6swAeaw+OAVEtHmhzb9A9oMOcNpNXT+fglnr2k3bniJAFrN5mCgBInn+beQyOkiv1xx6DMxYY5KQjkQDloa6IQQbx3x+RjtIBn5emr5qzSn8bY4qLKP8zcHCE8OBdZVZWRODXu8rciJlxyCxmVSlXm67Wtn4h9QmhVU83MHmLJZMYJ2U0vzmC71UeAA4vxLjbJhBA1M5+BANt9QhjlTPq6SyhGuG8Xv5BezKD7j/EhG1hGzbMwPCLYAiUII/hvJcAcB3Nvia5tpZWj5MeCEdczk+XVq/b7fc6XVee1u6kiIQW6gnX1lEeChwdPNIOX51PV6HkIcUkh+j0tVpcIRWJ876xUzva6kaccOUXF9VyFtRNO3gKuYQiJ7pEwLRHr+Hi+4w0FcBghU2by0C1A5mYZBKIZMMkKq2b7hk+Y3KWd7TpebVUgumcZzZQabd2toLWnxhkKYATYIV5V93Ujou1swL2qbJWy0zXfvweBlErQKsNwiNU+gEtldGdNUwxkL7Sulpa0KazVmUjFKZdbu+n45eH8lFhl+uqhKpXfhkKAVm7kCz9JNE4neYyc2cN1ORdZc1CiClvwOeiI1dVUBeE2B31RqLhwWsWYlxIz8tqpQJrZjclGedAlSWbNvuwlsnamDiHS0rbPMF9gVFPanG+TZFJqUKdtY0DjbA4FpEk8FWZSsMwcppLJocVRlJIHE1X/yHywghsUDhL9hEh1mp9yTQuc1pb+hcq5bHQHD9H03noe+CZtplAXYOttYp2OZ0+j98zcM2Sn93HEm50V7ihJF2RIZXo/ul5dhFlX5yvHEJssMCTLu/0tXZ/Ts/uXVySI/p6EYy0zB5UrcQluQlGYDnSAxghhb6uTVStrLd02ENTvuj1ih5eXYAv0EBksihbgfYC4gY9qLU5UbQ74vdEXdeMEW9m0dM1oruwisNqhyoZ/fA5WnyBS3ltptHDoWwWNgAaFMPSajfC0da+A6+qBTS/jnGrm+eFTG33jr6cq+1tG+FdUItz0FZou3fR2Stryu4OD3i8LiOEn0jZnaHBKO2P9kDP//st9uomOp65YIH+rdEuA/yKSad5IQ6CYUVUKWiJA55rtj/X8vna2bHZBhb2W/tilMMdLOzPyR7JwhSdFBUJhag/ILWwuDxZ/h2pNSAwySmZx/tIK/Ls8Qe8/tBVbfF65Ja5YIMqExo9nQGthI4Fn3w0tleGeIOwWpilx+2nPWH8I9EfjV7nWzm0saHPnaLDNdLO1Y4OavuZ35Fw34xz0pQtPPylRcb5olRg8H9xcnNAjs/0O6+0l2W8IrfEw4oMD4QjtM8ZuF6lQrozLb8MW8LG8YR+95R0NeQ4EiZdO3p8vrXaPE09zoFmtzPAQLykn9FOnqPsDik6iGVG/2WzdrSrl3Pnq8uwDqDDaYDF56GFZ/jWfP1g9Dinlhf08iI+iSjOgWONkwJyUE1anvqBZBYffhQP0fy/oUvCbt97ic/R9rbJwE1FONlRqwWAsrgtCXl80UEqfEWpu6BX8gz4Q75wpG0XTHvpAB11Yu63jOj0RLy4P/4Ea3YjJ2T3V8g0Tr1RdQtV8tAFo70FkrFu2GHW2MDQrcsSInOsMtTQmJZ5G7GLD/8Tbk2COlABQ6aDCXx24Xf7zeDGv2KwsSKm8ZaCxutVUOLjCZgs5XcaX0dFRRFTzdtJbqzlboJjYCPfZ/cbp9TBMVFU8NdAwIh8fFwxvjrJcKyYxME050m7PORyTGS/kPDPU+YGPMkLnGwMBR+GeIUFp92Uy4w6mbHxkfxk5Wj+r57+/wIAAP//AwBQSwMEFAAGAAgAAAAhANZks1H0AAAAMQMAABwACAF3b3JkL19yZWxzL2RvY3VtZW50LnhtbC5yZWxzIKIEASigAAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAArJLLasMwEEX3hf6DmH0tO31QQuRsSiHb1v0ARR4/qCwJzfThv69ISevQYLrwcq6Yc8+ANtvPwYp3jNR7p6DIchDojK971yp4qR6v7kEQa1dr6x0qGJFgW15ebJ7Qak5L1PWBRKI4UtAxh7WUZDocNGU+oEsvjY+D5jTGVgZtXnWLcpXndzJOGVCeMMWuVhB39TWIagz4H7Zvmt7ggzdvAzo+UyE/cP+MzOk4SlgdW2QFkzBLRJDnRVZLitAfi2Myp1AsqsCjxanAYZ6rv12yntMu/rYfxu+wmHO4WdKh8Y4rvbcTj5/oKCFPPnr5BQAA//8DAFBLAwQUAAYACAAAACEAOgXMGeEGAADOIAAAFQAAAHdvcmQvdGhlbWUvdGhlbWUxLnhtbOxZW2sbRxR+L/Q/LPuu6Lari4kcpJUUN7ETEyspeRyvRrtjze6ImZEdEQIheSqFQiEteWig9KUPpTTQQEP70P9Sl4Q0/RGdmZW0O9IsTmIbQrFtrLl858w355w5c7R7+cq9CFuHkDJE4pZdvlSyLRj7ZIjioGXfHvQLDdtiHMRDgEkMW/YMMvvK5qefXAYbPIQRtIR8zDZAyw45n2wUi8wXw4BdIhMYi7kRoRHgokuD4pCCI6E3wsVKqVQrRgDFthWDSKi9ORohH1p/v/zjzQ9P/3r4pfizNxdr9LD4F3MmB3xM9+QKUBNU2OG4LD/YjHmYWocAt2yx3JAcDeA9blsYMC4mWnZJ/djFzcvFpRDmObIZub76mcvNBYbjipKjwf5S0HFcp9Ze6lcAzNdxvXqv1qst9SkA8H2x04SLrrNe8Zw5NgNKmgbd3Xq3WtbwGf3VNXzblb8aXoGSprOG7/e91IYZUNJ01/Bup9np6voVKGnW1vD1Urvr1DW8AoUYxeM1dMmtVb3FbpeQEcFbRnjTdfr1yhyeooqZ6ErkY54XaxE4ILQvAMq5gKPY4rMJHAFf4F7//MXr3/+0tlEQiribgJgwMVqqlPqlqvgvfx3VUg4FGxBkhJMhn60NSToW8yma8JZ9TWi1M5BXL18eP3px/Oi348ePjx/9Ml97XW4LxEFW7u2PX//77KH1z6/fv33yjRnPsnhta0Y412h9+/z1i+evnn715qcnBnibgv0sfIAiyKwb8Mi6RSKxQcMCcJ++n8QgBCgr0Y4DBmIgZQzoHg819I0ZwMCA60DdjneoyBYm4NXpgUZ4L6RTjgzA62GkAXcIwR1CjXu6LtfKWmEaB+bF6TSLuwXAoWltb8XLvelEhD0yqfRCqNHcxcLlIIAx5JacI2MIDWJ3EdLsuoN8ShgZcesusjoAGU0yQPtaNKVCWygSfpmZCAp/a7bZuWN1CDap78JDHSnOBsAmlRBrZrwKphxERsYgwlnkNuChieTejPqawRkXng4gJlZvCBkzydykM43udSDSltHtO3gW6UjK0diE3AaEZJFdMvZCEE2MnFEcZrGfsbEIUWDtEm4kQfQTIvvCDyDOdfcdBDV3n3y2b4s0ZA4QOTOlpiMBiX4eZ3gEoEl5m0Zaim1TZIyOzjTQQnsbQgyOwBBC6/ZnJjyZaDZPSV8LRVbZgibbXAN6rMp+DBm0VG1jcCxiWsjuwYDk8NmZrSSeGYgjQPM03xjrIdPbp+IwmuIV+2MtlSIqD62ZxE0WafvL1bobAi2sZJ+Z43VGNf+9yxkTMgcfIAPfW0Yk9ne2zQBgbYE0YAYAWdumdCtENPenIvI4KbGpUW6kH9rUDcWVmidC8UkF0Erp455f6SMKjFffPTNgz6bcMQNPU+jk5ZLV8iYPt1rUeIQO0cdf03TBNN6F4hoxQC9KmouS5n9f0uSd54tC5qKQuShkzCLnUMiktYt6ALR4zKO0RLnPfEYI4z0+w3CbqaqHibM/7ItB1VFCy0dMk1A058tpuIAC1bYo4Z8jHu6FYCKWKasVAjZXHTBrQpgonNSwUbecwNNohwyT0XJ58VRTCACejovCazEuqjSejNbq6eO7pXrVC9Rj1gUBKfs+JDKL6SSqBhL1xeAJJNTOzoRF08CiIdXnslAfc6+Iy8kC8rm46ySMRLiJkB5KPyXyC++euafzjKlvu2LYXlNyPRtPayQy4aaTyIRhKC6P1eEz9nUzdalGT5pinUa9cR6+lklkJTfgWO9ZR+LMVV2hxgeTlj0S35hEM5oIfUxmKoCDuGX7fG7oD8ksE8p4F7AwgampZP8R4pBaGEUi1rNuwHHKrVypyz1+pOSapY/Pcuoj62Q4GkGf54ykXTGXKDHOnhIsO2QqSO+FwyNrH0/pLSAM5dbL0oBDxPjSmkNEM8GdWnElXc2Pova2JT2iAE9CML9Rssk8gav2kk5mH4rp6q70/nwz+4F00qlv3ZOF5EQmaeZcIPLWNOeP87vkM6zSvK+xSlL3aq5rLnJd3i1x+gshQy1dTKMmGRuopaM6tTMsCDLLLUMz744469tgNWrlBbGoK1Vv7bU22T8Qkd8V1eoUc6aoim8tFHiLF5JJJlCji+xyj1tTilr2/ZLbdryK6xVKDbdXcKpOqdBw29VC23Wr5Z5bLnU7lQfCKDyMym6ydl982cez+ct7Nb72Aj9alNqXfBIViaqDi0pYvcAvV0wv8Ady3raQsMz9WqXfrDY7tUKz2u4XnG6nUWh6tU6hW/Pq3X7XcxvN/gPbOlRgp131nFqvUaiVPa/g1EqSfqNZqDuVStuptxs9p/1gbmux88XnwryK1+Z/AAAA//8DAFBLAwQUAAYACAAAACEAMW9THM4EAABVDQAAEQAAAHdvcmQvc2V0dGluZ3MueG1stFfbUttIEH3fqv0Hl57X+IJlwBUnhW0cSEGSQrB5Hklta5a5qGZGNmZr/317bpYTNinIVl5g1Kf7dKunL/Kbd4+cdTagNJVimgyO+kkHRCFLKtbT5P5u2T1NOtoQURImBUyTHejk3dvff3uznWgwBtV0BymEnvBimlTG1JNeTxcVcKKPZA0CwZVUnBh8VOseJ+qhqbuF5DUxNKeMml1v2O+Pk0Ajp0mjxCRQdDktlNRyZazJRK5WtIDwL1qol/j1JgtZNByEcR57ChjGIIWuaK0jG/9ZNgSrSLL50UtsOIt620H/Ba+7larcW7wkPGtQK1mA1nhBnMUAqWgdj54R7X0foe/wio4KzQd9dzqMPH0dwfAZwbiAx9dxnAaOHloe8tDydTzjPQ9tEzsY/1wwBwS6NGX1KpZhzGvP2hJDKqL3VWQZ4XVBpXu6HW9zpNlLqsZD1zRXRPmeDCXDi8nVWkhFcobhYOl08PY7Ljr7F5No/7kjPDq5zUPyFmfEk5S8s53UoApsFBww/X7Ss0COYeLUWciP0mSNUrIR5SUQlH0XXkppAozFLVeZIQYDmOgaGHPzqmBAMN7tZK0Ix0kTJc6mhBVpmLkjeWZkjUobgmkZDUNApSJbJHmvaPknKEMLwrKaFCiKqoN0HFSprhnZXUpFn6QwhC1a2wucpbtoEam9fqT9nvbQaxcVUaTANw3u5+hCSRa17ORU2NifG1GYxs2vYOdGqj1pNISlVPfXPpeEEVFAhlwMZjuDc6vJ/ekLLU3lY7SZvgaygRkpHjTDMjy3E9+BDbtThLp8eIHTvniscS9kFV2ZWzA4xRxEyr8aba6pgEug68pciTtbN55Hw/LimuxkYw5CzvwewRcUhIN/w/1uuJEl2BttFH15J1iDcGWHufnWkcTs4yWACzAzO4ZJEyajT3Auyg/4FhQZfYZ/PoIfBQDCev6ErXi3q2EJBLOI+/XXOHN3tmS0vqHYT+pKlNiRv8wZXa1AoQOKPXqDbUeV3Lo8+yb/VX6xwr6gMk7CYyzZ4mEmjZH8cldXmOv/d5OumXuHfYYfTaWOh1ucTnvV/jw9OT9Z+kgt+hLkop9ejMMQ+AY5GS/ms+A/eOUT+8HxWcWTLd0O9xZzwnNFSefGfpL0rEauHmZURDwHnPtwiGRNHsFu1wOaE8aWmMQIuARwN80WsHJndkPUuuUNGuo/pTh/P+y57DYA9R7neu3RrSK1L8moMhiNgiUVOE94lOsmz6KVwE11AOGS+LRRLk9terYTg1fsWvuatNMcRPc+s5cLRJtzTck0eaq684+hupjKbGXADalrX2D5ejBNmB1pA2tm8KnEj1n3kK+HARs6bOgx90AK+7KoHQ6tbBhlB3rHUXbcykZRNmplaZSlrWwcZWMrq3CkKNyLD1jr8WjlK8mY3EJ52eLPRD4JuiI1LPzaxIqTXhD2qO5sJvCIKx1KavA3Qk1LTh7thh+6NRm0mZv2X+lazCrXXzPYr5/Q3b2vjF3VfxOLXecFxQrNdjxvl+ORD5xRjZOhxj1qpIrYHw4bpG7BGjcd8GJvYTUjGsqAlbK4sl8zqbf5ezRPF7P0vN9dLEdpd3RyPOyezk/H3f5penp2Njs7mR0v/wmNGX8Pvf0XAAD//wMAUEsDBBQABgAIAAAAIQDwgl4meQsAAARyAAAPAAAAd29yZC9zdHlsZXMueG1svJ1Nc9s4EobvW7X/gaXT7sGRv52kxplynGTt2jjjGTmbM0RCFsYgoQWp2J5fvwBISZCboNhgry+JRbEfgnjxNtAkJf3y61Muk59cl0IV56ODN/ujhBepykRxfz76fvdl7+0oKStWZEyqgp+Pnnk5+vXD3//2y+P7snqWvEwMoCjf5+n5aF5Vi/fjcZnOec7KN2rBC/PmTOmcVealvh/nTD8sF3upyhesElMhRfU8PtzfPx01GN2HomYzkfJPKl3mvKhc/FhzaYiqKOdiUa5oj31oj0pnC61SXpbmpHNZ83ImijXm4BiAcpFqVapZ9cacTNMihzLhB/vur1xuACc4wCEAnKb8Ccd42zDGJtLniAzHOV1zROZx4hrjAcqsyuYoyuGqX8c2llVszsq5T+S4Rp2scc+57aM8fX99XyjNptKQjOqJES5xYPuvOX/7n/uTP7nt9hRGH4wXMpV+4jO2lFVpX+pb3bxsXrn/vqiiKpPH96xMhbgzDTRHyYU54NVFUYqReYezsrooBfPf/Nxss+/P7Y6tkWlZeZs/ikyMxvagD1wX5u2fTJ6PDutN5V/rDQerLZe2XfW2Zi/JivvVNl7sfZ/47Tsf/TXfu/xmN03Noc5HTO9NLmzguDnd+n+vExbrV/VeL3rMuNd4eVKnFPMun31V6QPPJpV543y0bw9lNn6/vtVCaZM2zkfv3jUbJzwXVyLLeOHtWMxFxn/MefG95Nlm++9fnPWbDalaFubvo7NTp6Iss89PKV/YRGLeLZjt0G82QNq9l2JzcBf+3xWs6cfW+DlnNpsmBy8RrvkoxKGNKL2zbWcuX5y72wt1oKPXOtDxax3o5LUOdPpaBzp7rQO9fa0DOcz/80CiyEzidvvDwwDqLk7AjWhOwGxoTsBLaE7AKmhOwAloTmCgozmBcYzmBIYpglOpNDQKvcF+FBjt3dzdc0Qcd/eUEMfdPQPEcXcn/Dju7vwex92dzuO4u7N3HHd3ssZz66VWcm1sVlSDXTZTqipUxZOKPw2nscKwXIlJw7OTHtckJ0mAqTNbMxEPpqXMvd49QpxJ4+fzylZqiZolM3G/1Lwc3HBe/ORSLXjCsszwCIGaV0sd6JGYMa35jGtepJxyYNNBpSh4UizzKcHYXLB7MhYvMuLuWxFJksJ6QLNlNbcmEQSDOmepVsObphhZfvgqyuF9ZSHJx6WUnIj1jWaIOdbw2sBhhpcGDjO8MnCY4YWBpxlVFzU0op5qaEQd1tCI+q0en1T91tCI+q2hEfVbQxveb3eiki7F+6uOg/7X7i6lsjcFBrdjIu4LZhYAw6eb5pppcss0u9dsMU/sVeV2rH/O2ON8VNlzckcxp61JVOt6N0QuzVmLYjm8Q7doVOZa84jsteYRGWzNG26xG7NMtgu0K5p6ZrKcVq2mdaRepp0wuawXtMPdxqrhI2xjgC9Cl2Q2aMcSjOBvdjlr5aTIfJtWDm/YhjXcVi+zEmnzGiRBK6VKH2jS8NXzgmtTlj0MJn1RUqpHntERJ5VW9VjzLX/oJOll+c/5Ys5K4WqlLUT/qX71OEFywxaDT+hWMlHQ6PZ5L2dCJnQriKu7m6/JnVrYMtN2DA3wo6oqlZMxmyuB//jBp/+kaeCFKYKLZ6KzvSC6PORgl4JgkqlJKiMimWWmKATJHOp4/+bPU8V0RkO71bx+gqfiRMQJyxf1ooPAWyYvPpr8Q7Aacrz/MC3sdaHBNO9KX7mc/snT4dnpm0pILub8tqzcJUO3OnXRdLjhM/sWbvisfueu8k2EHXIEJ7uFG36yWziqk72UrCxF8K5nNI/qdFc86vMdXq81PCWVni0lXQeugGQ9uAKSdaGSy7woKc/Y8QhP2PGoz5dwyDgewVU0x/uXFhmZGA5GpYSDUcngYFQaOBipAMMfqvFgw5+s8WDDH6+pYURLAA9GNc5Ip3+iGzMejGqcORjVOHMwqnHmYFTj7OhTwmczswimm2I8JNWY85B0E01R8XyhNNPPRMjPkt8zgmuaNe1Wq5n9NIYq6ueuCZD2srIkXGzXOCqRf/ApWdMsi+BaJpNSKaJLWJtJwkVuPyIWDruVLOVzJTOuA+0Ix5q6dLJgaXMFG9wJ63VF8Ku4n1fJZL6+EO5jTvd3Rq4K462w3Qds66fT1Ycy2sJueCaW+aqh8HMGp0f9g93I2Qo+3h28mbG3Ik96RsJjnu6O3KxGtyLPekbCY77tGemy8FZk1xj+xPRD60A46xo/61oqMPjOukbROrj1sF0DaR3ZNgTPukbRllWSizS1F9KhOv08E47vZ55wPMZFYQrGTmFKb1+FEV0G+4P/FHYGxSRNd7z1gwUgV7vFaq/M+ftS1Ze0t+7F9P+807VZoBQlT1o5R/3v6WxlmXA/9k43YUTvvBNG9E5AYUSvTBQMR6WkMKV3bgojeiepMAKdreCMgMtWMB6XrWB8TLaClJhsNWAVEEb0Xg6EEWijQgTaqANWCmEEyqggPMqokII2KkSgjQoRaKPCBRjOqDAeZ1QYH2NUSIkxKqSgjQoRaKNCBNqoEIE2KkSgjRq5tg+GRxkVUtBGhQi0USECbVS3XhxgVBiPMyqMjzEqpMQYFVLQRoUItFEhAm1UiEAbFSLQRoUIlFFBeJRRIQVtVIhAGxUi0EatP4UXb1QYjzMqjI8xKqTEGBVS0EaFCLRRIQJtVIhAGxUi0EaFCJRRQXiUUSEFbVSIQBsVItBGdTflBhgVxuOMCuNjjAopMUaFFLRRIQJtVIhAGxUi0EaFCLRRIQJlVBAeZVRIQRsVItBGhYiu8dncCgw9gX6Av+oZfJi9/62rplF/+J9y9lFH/VGrVoVZ/R/T/6jUQ9L6mbwjV2/0g4ipFMpdog7cvva57tED1M3K3y67P/zi0wd+H1HzMQF3exTAj/tGgmsqx11D3o8ERd5x10j3I8Gq87gr+/qRYBo87kq6zperhz/MdASCu9KMF3wQCO/K1l447OKuHO0Fwh7uysxeIOzgrnzsBZ4kNjm/jD7p2U+n6+c4AaFrOHqEszCha1hCrVbpGBqjr2hhQl/1woS+MoYJKD2DGLywYRRa4TAqTmpoM6zU8UYNE7BSQ0KU1AATLzVERUsNUXFSw8SIlRoSsFLHJ+cwIUpqgImXGqKipYaoOKnhVIaVGhKwUkMCVuqBE3IQEy81REVLDVFxUsPFHVZqSMBKDQlYqSEhSmqAiZcaoqKlhqg4qUGVjJYaErBSQwJWakiIkhpg4qWGqGipIapLancVZUtqlMJeOG4R5gXiJmQvEJecvcCIasmLjqyWPEJktQS1WmmOq5Z80cKEvuqFCX1lDBNQegYxeGHDKLTCYVSc1LhqqU3qeKOGCVipcdVSUGpctdQpNa5a6pQaVy2FpcZVS21S46qlNqnjk3OYECU1rlrqlBpXLXVKjauWwlLjqqU2qXHVUpvUuGqpTeqBE3IQEy81rlrqlBpXLYWlxlVLbVLjqqU2qXHVUpvUuGopKDWuWuqUGlctdUqNq5bCUuOqpTapcdVSm9S4aqlNaly1FJQaVy11So2rljqlxlVLNyZEEHw70iRnukrovkrtipXzig3/3r7vhealkj95ltCe6lfUWY4ft34ZyrLdz86Z/SvTZ/bLwb2PK2X1l6M2QLfjtSEx9+NOthFJ84NWzW86ubY2d2rd34v6l7oeRaYe7cektZKrkGaI/pmuNkxVNW+a6MLGzRFhG9O5aWTafB9UqI37oJGBr3p1zdiM09XeTc9vurXeb6tT69YGWllZX3S18CDQjbWjQu1616SIXQ0zzZjKuvvNH9dFZgCPze9s1Q3MnliNMu9fcilvWL23WoR3lXxW1e8e7LsvDnjx/rT+2rpgvHZJPAgYbzemftk9GOovsm+eLgh19WFLV7vHXIb28qZdq7/KD/8DAAD//wMAUEsDBBQABgAIAAAAIQACD8w67wEAAEcIAAAUAAAAd29yZC93ZWJTZXR0aW5ncy54bWzsld9umzAUxu8n7R2Q7xsgCzRBTSplVadJ0zR13QMY2wRrtg+ynZD06Wc7kNJlF6FSd9UbfPjs78f5I4ub270U0Y5pw0EtUTpJUMQUAcrVZol+Pd5fzVFkLFYUC1BsiQ7MoNvVxw83bdGy8iez1p00kaMoU0iyRLW1TRHHhtRMYjOBhim3WYGW2LpXvYkl1r+3zRUB2WDLSy64PcTTJMlRh9GXUKCqOGF3QLaSKRv8sWbCEUGZmjemp7WX0FrQtNFAmDGuHimOPIm5OmHS2RlIcqLBQGUnrpguo4By9jQJkRTPgGwcYHoGyAnbj2PMO0bsnEMOp+M4+YnD6YDzumQGAEMtrUdRpn1fY+/FFtfY1EMiG5dUdsIdpO+RJMXXjQKNS+FIbuqRG1wUwP7p6vdLCNk+6L4EtHIXgvKd6daoLXyL08UsT9NZli3CgRLo4S5s7rBwuyj2qrsP31hlezU5qQ98U/9DfoTmXFyDtSD/0l0ia6p9ZJ89yt1j5F7Mkz/ngwYT1sUEBLjrh7cWjggxyGycs3yR0TivHlY+xhoPi/bz+FxzQV8OZTrPsiS/nl2Hmbx3/z93P/uULZIkS967/3bdP4b92o/hItWnAI3lkj+xe9BrDa1hOmSGhYD2x/cvx28Nfv+rPwAAAP//AwBQSwMEFAAGAAgAAAAhAKMhTSYeAgAAkwYAABIAAAB3b3JkL2ZvbnRUYWJsZS54bWzck99umzAUxu8n7R2Q7xsMSSiLSqo2baRJUy+mTtqtYwxYwzaynX+PsIfZC+xmj9PX6LGBNGpaLexyIMB8x+d3fD7M1fVO1MGGacOVzFA0wihgkqqcyzJD3x6XFykKjCUyJ7WSLEN7ZtD1/OOHq+2sUNKaAPKlmQmaocraZhaGhlZMEDNSDZMQLJQWxMKrLkNB9I91c0GVaIjlK15zuw9jjBPUYfQ5FFUUnLI7RdeCSevzQ81qICppKt6YnrY9h7ZVOm+0oswY6FnULU8QLg+YaHICEpxqZVRhR9BMtyKPgvQI+5GoXwDTYYD4BJBQthvGSDtGCJnHHJ4P4yQHDs+POP+2mCOAyW1eDaLEva+hyyWWVMRUx0Q2bFHTA24vnEeCzj6XUmmyqoEEXz2ADxd4sLtD/+7hh2znddcCmne/QrCdSSIg8+nXz6fff7xOavsAGoQ2pM7QHZPld04kCl2wIVIZFvVB7HZOgjEew7M724m0ItowV8BPTJNWLojg9b5XydqqjsstrXp5QzR3DbUhw0sIrM0KZ+gGSuH4dolaJcrQOF0sLxfLm06JYU3+iJJOGfcKxk6hngMvE7g8h3rOYQ7UDFtzTkx65IKZ4IFtg69KgCNvGxKDIWM8hQJTGI/x5E1D2kqvDdGeO8SRe2fI/fLIkQUol+n09rUj+NNfHAHTWs75jrTbJvjCy8q+Y8f/vD+6gZk/AwAA//8DAFBLAwQUAAYACAAAACEAtY3V8GoBAADdAgAAEQAIAWRvY1Byb3BzL2NvcmUueG1sIKIEASigAAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAjJJdT4MwFIbvTfwPpPdQCkYXAixRsyuXGJ2Z2V1tz7Y6KE3bje3fW2Awibvw7ny858npe5pOj2XhHUAbUckMkSBEHkhWcSE3GfpYzPwJ8oylktOikpChExg0zW9vUqYSVml41ZUCbQUYz5GkSZjK0NZalWBs2BZKagKnkK65rnRJrUv1BivKdnQDOArDe1yCpZxaihugrwYiOiM5G5Bqr4sWwBmGAkqQ1mASEHzRWtCluTrQdn4pS2FPCq5K++agPhoxCOu6Duq4lbr9Cf6cv7y3T/WFbLxigPKUs8QKW0Ce4kvoIrP/+gZmu/KQuJhpoLbS+XL11vb6vHF6B6e60ty4qVHmZBwM00JZd7+OOSo4dUGNnbuDrgXwx1OH/1tulBoOovkHedQqhjQ9m9qtBNxzZiSddX1nGT89L2Yoj8Io9sMHn5AFuUuiSRKGq2ar0fwFWJ4X+DcxJmNiD+iMGX/I/AcAAP//AwBQSwMEFAAGAAgAAAAhAHOEmAR3AQAAywIAABAACAFkb2NQcm9wcy9hcHAueG1sIKIEASigAAEAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAnFLLTsMwELwj8Q9R7tQpqKVCWyPUCnHgUakBzpa9SSwc27JN1f49G9KGIG7ktDPrHc9sDLf71mQ7DFE7u8ynkyLP0EqntK2X+Wt5f7HIs5iEVcI4i8v8gDG/5ednsAnOY0gaY0YSNi7zJiV/w1iUDbYiTqhtqVO50IpEMNTMVZWWuHbys0Wb2GVRzBnuE1qF6sIPgnmveLNL/xVVTnb+4lt58KTHocTWG5GQP3eTZqJcaoENLJQuCVPqFvkV0QOAjagx8imwvoB3FxTh6QxYX8KqEUHIRBvk89kC2AjDnfdGS5Fot/xJy+Ciq1L28m046+aBjY8Ahdii/Aw6HXgBbAzhUVsyQPf2BTkLog7CN0d7A4KtFAZXFJ9XwkQE9kPAyrVeWJJjQ0V6H/HVl27dbeI48pschXzXqdl6IcnC9fXlOO6oA1tiUZH/wcJAwAP9kmA6fZq1NarTmb+NboFv/dvk0/mkoO97YyeOcg+Phn8BAAD//wMAUEsBAi0AFAAGAAgAAAAhAN+k0mxaAQAAIAUAABMAAAAAAAAAAAAAAAAAAAAAAFtDb250ZW50X1R5cGVzXS54bWxQSwECLQAUAAYACAAAACEAHpEat+8AAABOAgAACwAAAAAAAAAAAAAAAACTAwAAX3JlbHMvLnJlbHNQSwECLQAUAAYACAAAACEA93qSMy4JAAAbJAAAEQAAAAAAAAAAAAAAAACzBgAAd29yZC9kb2N1bWVudC54bWxQSwECLQAUAAYACAAAACEA1mSzUfQAAAAxAwAAHAAAAAAAAAAAAAAAAAAQEAAAd29yZC9fcmVscy9kb2N1bWVudC54bWwucmVsc1BLAQItABQABgAIAAAAIQA6BcwZ4QYAAM4gAAAVAAAAAAAAAAAAAAAAAEYSAAB3b3JkL3RoZW1lL3RoZW1lMS54bWxQSwECLQAUAAYACAAAACEAMW9THM4EAABVDQAAEQAAAAAAAAAAAAAAAABaGQAAd29yZC9zZXR0aW5ncy54bWxQSwECLQAUAAYACAAAACEA8IJeJnkLAAAEcgAADwAAAAAAAAAAAAAAAABXHgAAd29yZC9zdHlsZXMueG1sUEsBAi0AFAAGAAgAAAAhAAIPzDrvAQAARwgAABQAAAAAAAAAAAAAAAAA/SkAAHdvcmQvd2ViU2V0dGluZ3MueG1sUEsBAi0AFAAGAAgAAAAhAKMhTSYeAgAAkwYAABIAAAAAAAAAAAAAAAAAHiwAAHdvcmQvZm9udFRhYmxlLnhtbFBLAQItABQABgAIAAAAIQC1jdXwagEAAN0CAAARAAAAAAAAAAAAAAAAAGwuAABkb2NQcm9wcy9jb3JlLnhtbFBLAQItABQABgAIAAAAIQBzhJgEdwEAAMsCAAAQAAAAAAAAAAAAAAAAAA0xAABkb2NQcm9wcy9hcHAueG1sUEsFBgAAAAALAAsAwQIAALozAAAAAA==
2.2 向ES中添加一條記錄
使用kibana
控制臺添加一條記錄,把上面得到的Base64
內(nèi)容粘貼到content
字段上(注意要加雙引號),
POST /docwrite/_doc?pipeline=attachment
{
"userId": 1001,
"docId": 10003,
"docName": "es.docx",
"docType": "docx",
"content": "[此處放Base64內(nèi)容]"
}
通過以下查詢語句,檢查記錄中的content
字段是否已被文本抽取管道處理過。
GET /docwrite/_search
可以發(fā)現(xiàn),content
字段已經(jīng)被IK分詞器進行分詞存儲了。
2.3 測試關(guān)鍵詞高亮搜索
我們的最終目的,還是需要通過搜索關(guān)鍵詞,把匹配到的文檔信息顯示出來。
這里就需要用到關(guān)鍵詞的高亮搜索。
比如,如果我想搜索關(guān)鍵詞“Elasticsearch”,那么可以執(zhí)行以下語句:
GET /docwrite/_search
{
"query": {
"match": {
"attachment.content": {
"query": "Elasticsearch",
"analyzer": "ik_smart"
}
}
},
"highlight": {
"fields": {
"attachment.content": {
"pre_tags": "<strong>",
"post_tags": "</strong>"
}
}
}
}
這樣就能夠搜索到相關(guān)的記錄,在該記錄的 “highlight” 字段中,就顯示出了和關(guān)鍵詞匹配的文本內(nèi)容,其中關(guān)鍵字是使用了<strong>
標(biāo)簽進行高亮顯示。
這就好比我們平時在百度中搜索一個關(guān)鍵詞,然后出現(xiàn)和關(guān)鍵詞相關(guān)的文本內(nèi)容,而且關(guān)鍵字會進行高亮顯示(比如設(shè)置為紅字)。
三、SpringBoot 實現(xiàn)
如果以上步驟能流暢的走完,SpringBoot
后端的實現(xiàn)就變得很容易了。
3.1 elasticsearch配置
(1)pom.xml
添加elasticsearch
和IOUtils
依賴
<!-- elasticsearch -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-elasticsearch</artifactId>
</dependency>
<!-- IOUtils -->
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.8.0</version>
</dependency>
<!-- lombok -->
<dependency>
<groupId>org.projectlombok</groupId>
<artifactId>lombok</artifactId>
<optional>true</optional>
</dependency>
(2)application.yml
添加elasticsearch
服務(wù)器地址、端口號
# 自定義參數(shù)
my-config:
# elasticsearch自定義配置
elasticsearch:
url: localhost
port: 9200
(3)ElasticSearchConfig 類
@Configuration
@Slf4j
public class ElasticSearchConfig {
@Value("${my-config.elasticsearch.url}")
private String esHost;
@Value("${my-config.elasticsearch.port}")
private int esPort;
/**
* 獲取ES操作對象,注入bean中
* @return ES client對象
*/
@Bean("myESClient")
public RestHighLevelClient myElasticsearchClient() {
return new RestHighLevelClient(RestClient.builder(
new HttpHost(esHost, esPort, "http")
));
}
}
(4)elasticsearch 工具類
@Component
@Slf4j
public class ElasticSearchClient {
@Autowired
@Qualifier("myESClient")
private RestHighLevelClient restHighLevelClient;
/**
* 獲得關(guān)鍵詞搜索結(jié)果
* @param index
* @param sourceBuilder
* @return
*/
public SearchHit[] selectDocumentList(String index, SearchSourceBuilder sourceBuilder) {
try {
SearchRequest request = new SearchRequest(index);
if (sourceBuilder != null) {
// 返回實際命中數(shù)
sourceBuilder.trackTotalHits(true);
request.source(sourceBuilder);
}
SearchResponse response = restHighLevelClient.search(request, RequestOptions.DEFAULT);
if (response.getHits() != null) {
return response.getHits().getHits();
}
} catch (Exception e) {
e.printStackTrace();
}
return null;
}
/**
* 插入/修改文檔信息
* @param index 索引
* @param data 數(shù)據(jù)
*/
public void insertDocument(String index, Object data) {
try {
String id = UUID.randomUUID().toString().replaceAll("-", "").toUpperCase();
IndexRequest request = new IndexRequest(index);
request.timeout(TIME_VALUE_SECONDS);
request.id(id);
// 重要??!必須設(shè)置管道
request.setPipeline("attachment");
request.source(JSON.toJSONString(data), XContentType.JSON);
IndexResponse response = restHighLevelClient.index(request, RequestOptions.DEFAULT);
log.debug("[es] 插入文檔的響應(yīng)狀態(tài): status:{},id:{}", response.status().getStatus(), response.getId());
String status = response.status().toString();
if ("CREATED".equals(status) || "OK".equals(status)) {
log.debug("[es] 插入文檔成功! ");
return true;
}
} catch (Exception e) {
e.printStackTrace();
log.error("[es] 插入文檔失敗");
}
return false;
}
}
3.2 DocumentObj 實體類
用于記錄文檔文件的某些參數(shù)
@Data
public class DocumentObj implements Serializable{
/** 當(dāng)前文件所屬用戶id */
private Long userId;
/** mysql中的文件id */
private Long docId;
/** 文件名字 */
private String docName;
/** 文件類型 */
private String docType;
/** 文件的base64內(nèi)容 */
private String content;
private static final long serialVersionUID = 1L;
public DocumentObj() {}
}
3.3 Service 接口
public interface ISearchService {
/**
* (測試)根據(jù)關(guān)鍵詞,搜索文檔
* @param keyword
* @return
*/
List<DocumentObj> testSearch(String keyword);
/**
* (測試)把本地文檔加載到elasticsearch中
*/
boolean testLoadDocument();
}
3.4 ServiceImpl 實現(xiàn)類
@Slf4j
@Service
public class SearchServiceImpl implements ISearchService {
@Autowired
private ElasticSearchClient esClient;
@Override
public List<DocumentObj> testSearch(String keyword) {
// 高亮查詢,關(guān)鍵詞添加紅色樣式
HighlightBuilder highlightBuilder = new HighlightBuilder()
.field("attachment.content")
.preTags("<font color='red' font-weight='bold'>")
.postTags("</font>");
// 普通全索引查詢
SearchSourceBuilder searchSourceBuilder =
new SearchSourceBuilder()
.query(QueryBuilders.matchQuery("attachment.content", keyword).analyzer("ik_smart"))
.highlighter(highlightBuilder);
SearchHit[] searchHits = esClient.selectDocumentList("docwrite", searchSourceBuilder);
// 處理每一條記錄(每一個文檔),獲得高亮文本。
List<DocumentObj> results = new ArrayList<>();
for (SearchHit hit : searchHits) {
Map<String, Object> sourceAsMap = hit.getSourceAsMap();
DocumentObj obj = new DocumentObj();
obj.setDocId( ((Integer) sourceAsMap.get("docId")).longValue() );
obj.setDocName( (String) sourceAsMap.get("docName") );
HighlightField contentHighlightField = hit.getHighlightFields().get("attachment.content");
// 對于一個文檔,它的高亮文本有多個結(jié)果,這里只拼接前2個結(jié)果。
String highLightMessage = contentHighlightField.fragments()[0].toString()
+ " " + contentHighlightField.fragments()[1].toString();
obj.setContent(highLightMessage);
results.add(obj);
}
return results;
}
@Override
public boolean testLoadDocument() {
// 用本地文檔進行測試
try {
// 加載文件,得到base64
File file = new File("D:\\桌面文件\\es介紹.docx");
InputStream fileInputStream = new FileInputStream(file);
byte[] bytes = IOUtils.toByteArray(fileInputStream);
String base64 = Base64.getEncoder().encodeToString(bytes);
// 向es添加文檔
DocumentObj obj = new DocumentObj();
obj.setUserId(1001L);
obj.setDocId(666L);
obj.setDocName("es介紹.docx");
obj.setDocType("docx");
obj.setContent(base64);
return esClient.insertDocument("docwrite", obj);
} catch (IOException e) {
e.printStackTrace();
}
return false;
}
}
3.5 Controller 層
@RestController
@RequestMapping("/test")
public class TestController {
@RequestMapping("/es/search")
public ResponseEntity<?> testSearch(String keyword) {
return ResponseEntity.ok( searchService.testSearch(keyword) );
}
@RequestMapping("/es/addone")
public ResponseEntity<?> testAddone() {
return ResponseEntity.ok( searchService.testLoadDocument() );
}
}
3.6 測試
(1)加載文件
http://localhost:8002/test/es/addone
(2)關(guān)鍵字查詢
搜索的關(guān)鍵詞是 “Elasticsearch”。
http://localhost:8002/test/es/search?keyword=Elasticsearch
文章來源:http://www.zghlxwxcb.cn/news/detail-596025.html
前端把特定的關(guān)鍵詞傳入接口,接口就會從elasticsearch
服務(wù)器中得到對應(yīng)的記錄。文章來源地址http://www.zghlxwxcb.cn/news/detail-596025.html
到了這里,關(guān)于SpringBoot 項目使用 Elasticsearch 對 Word、Pdf 等文檔內(nèi)容的檢索的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!