在文本使用三維向量的相似度時(shí),對(duì)三種相似度的對(duì)比。 當(dāng)前基于已經(jīng)搭建好的Elasticsearch、Kibana。?
1、創(chuàng)建索引庫(kù)
PUT my-index-000002
{
"mappings": {
"properties": {
"my_dense_vector": {
"type": "dense_vector",
"dims": 3
},
"status" : {
"type" : "keyword"
}
}
}
}
創(chuàng)建成功:
{
"acknowledged": true,
"shards_acknowledged": true,
"index": "my-index-000002"
}
2、放入數(shù)據(jù)
PUT my-index-000002/_doc/1
{
"my_dense_vector": [1, 0,0],
"status" : "published"
}
PUT my-index-000002/_doc/2
{
"my_dense_vector": [0,1,0],
"status" : "published"
}
PUT my-index-000002/_doc/3
{
"my_dense_vector": [0,0,1],
"status" : "published"
}
返回結(jié)果類似如下
{
"_index": "my-index-000002",
"_id": "3",
"_version": 1,
"result": "created",
"_shards": {
"total": 2,
"successful": 1,
"failed": 0
},
"_seq_no": 2,
"_primary_term": 1
}
3、查看所有數(shù)據(jù)
GET my-index-000002/_search
結(jié)果如下:?
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "my-index-000002",
"_id": "1",
"_score": 1,
"_source": {
"my_dense_vector": [
1,
0,
0
],
"status": "published"
}
},
{
"_index": "my-index-000002",
"_id": "2",
"_score": 1,
"_source": {
"my_dense_vector": [
0,
1,
0
],
"status": "published"
}
},
{
"_index": "my-index-000002",
"_id": "3",
"_score": 1,
"_source": {
"my_dense_vector": [
0,
0,
1
],
"status": "published"
}
}
]
}
}
4、L1方法查詢數(shù)據(jù)
GET my-index-000002/_search
{
"query": {
"script_score": {
"query" : {
"bool" : {
"filter" : {
"term" : {
"status" : "published"
}
}
}
},
"script": {
"source": "1 / (1 + l1norm(params.queryVector, 'my_dense_vector'))",
"params": {
"queryVector": [0, 0, 1]
}
}
}
}
}
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "my-index-000002",
"_id": "3",
"_score": 1,
"_source": {
"my_dense_vector": [
0,
0,
1
],
"status": "published"
}
},
{
"_index": "my-index-000002",
"_id": "1",
"_score": 0.33333334,
"_source": {
"my_dense_vector": [
1,
0,
0
],
"status": "published"
}
},
{
"_index": "my-index-000002",
"_id": "2",
"_score": 0.33333334,
"_source": {
"my_dense_vector": [
0,
1,
0
],
"status": "published"
}
}
]
}
}
結(jié)果中,id1和id2得分相同,但在文本向量空間中他們不同。
5、使用l2查詢
GET my-index-000002/_search
{
"query": {
"script_score": {
"query" : {
"bool" : {
"filter" : {
"term" : {
"status" : "published"
}
}
}
},
"script": {
"source": "1 / (1 + l2norm(params.queryVector, 'my_dense_vector'))",
"params": {
"queryVector": [0, 0, 1]
}
}
}
}
}
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "my-index-000002",
"_id": "3",
"_score": 1,
"_source": {
"my_dense_vector": [
0,
0,
1
],
"status": "published"
}
},
{
"_index": "my-index-000002",
"_id": "1",
"_score": 0.41421357,
"_source": {
"my_dense_vector": [
1,
0,
0
],
"status": "published"
}
},
{
"_index": "my-index-000002",
"_id": "2",
"_score": 0.41421357,
"_source": {
"my_dense_vector": [
0,
1,
0
],
"status": "published"
}
}
]
}
}
同樣出現(xiàn)相同情況,l1和l2計(jì)算文本的距離有相同得分
6、cos 查詢
GET my-index-000002/_search
{
"query": {
"script_score": {
"query" : {
"bool" : {
"filter" : {
"term" : {
"status" : "published"
}
}
}
},
"script": {
"source": "cosineSimilarity(params.query_vector, 'my_dense_vector') + 1.0",
"params": {
"query_vector": [0, 0, 1]
}
}
}
}
}
結(jié)果
{
"took": 1,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 2,
"hits": [
{
"_index": "my-index-000002",
"_id": "3",
"_score": 2,
"_source": {
"my_dense_vector": [
0,
0,
1
],
"status": "published"
}
},
{
"_index": "my-index-000002",
"_id": "1",
"_score": 1,
"_source": {
"my_dense_vector": [
1,
0,
0
],
"status": "published"
}
},
{
"_index": "my-index-000002",
"_id": "2",
"_score": 1,
"_source": {
"my_dense_vector": [
0,
1,
0
],
"status": "published"
}
}
]
}
}
三種方法都會(huì)產(chǎn)生 不同向量的相同分?jǐn)?shù)情況
GET my-index-000002/_search
{
"query": {
"script_score": {
"query" : {
"bool" : {
"filter" : {
"term" : {
"status" : "published"
}
}
}
},
"script": {
"source": "cosineSimilarity(params.query_vector, 'my_dense_vector') + 1.0",
"params": {
"query_vector": [0, 0, 100]
}
}
}
}
}
結(jié)果:文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-784637.html
{
"took": 2,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 3,
"relation": "eq"
},
"max_score": 2,
"hits": [
{
"_index": "my-index-000002",
"_id": "3",
"_score": 2,
"_source": {
"my_dense_vector": [
0,
0,
1
],
"status": "published"
}
},
{
"_index": "my-index-000002",
"_id": "1",
"_score": 1,
"_source": {
"my_dense_vector": [
1,
0,
0
],
"status": "published"
}
},
{
"_index": "my-index-000002",
"_id": "2",
"_score": 1,
"_source": {
"my_dense_vector": [
0,
1,
0
],
"status": "published"
}
}
]
}
}
三種方法都會(huì)存在 不同空間位置,得到向量距離可能相同的情況文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-784637.html
到了這里,關(guān)于Elasticsearch(實(shí)踐一)相似度方法L1、L2 、cos的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!