現(xiàn)象
集群所有數(shù)據(jù)節(jié)點頻繁因為StackOverflowError的錯誤掛掉,啟動后還會掛掉,StackOverflowError異常棧如下
[2023-12-22T16:03:44,057][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [xr-data-hdp-dn-rtyarn0725] fatal error in thread [elasticsearch[xr-data-hdp-dn-rtyarn0725][write][T#6]], exiting
java.lang.StackOverflowError: null
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseProperties(ObjectMapper.java:283) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseObjectOrDocumentTypeProperties(ObjectMapper.java:237) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parse(ObjectMapper.java:210) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseProperties(ObjectMapper.java:319) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseObjectOrDocumentTypeProperties(ObjectMapper.java:237) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parse(ObjectMapper.java:210) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseProperties(ObjectMapper.java:319) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseObjectOrDocumentTypeProperties(ObjectMapper.java:237) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parse(ObjectMapper.java:210) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseProperties(ObjectMapper.java:319) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseObjectOrDocumentTypeProperties(ObjectMapper.java:237) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parse(ObjectMapper.java:210) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseProperties(ObjectMapper.java:319) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseObjectOrDocumentTypeProperties(ObjectMapper.java:237) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parse(ObjectMapper.java:210) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseProperties(ObjectMapper.java:319) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseObjectOrDocumentTypeProperties(ObjectMapper.java:237) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parse(ObjectMapper.java:210) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseProperties(ObjectMapper.java:319) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseObjectOrDocumentTypeProperties(ObjectMapper.java:237) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parse(ObjectMapper.java:210) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseProperties(ObjectMapper.java:319) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseObjectOrDocumentTypeProperties(ObjectMapper.java:237) ~[elasticsearch-7.9.1.jar:7.9.1]
...
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parse(ObjectMapper.java:210) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseProperties(ObjectMapper.java:319) ~[elasticsearch-7.9.1.jar:7.9.1]
at org.elasticsearch.index.mapper.ObjectMapper$TypeParser.parseObjectOrDocumentTypeProperties(ObjectMapper.java:237) ~[elasticsearch-7.9.1.jar:7.9.1]
處理
通過堆??梢钥闯鍪菍懭刖€程池[write]發(fā)生的Stackoverflow,并且可能是在解析mapping的過程發(fā)生的,通過ObjectMapper類推斷是Object類型數(shù)據(jù)寫入導致的。因此通過拉取集群內(nèi)所有索引的mapping,嘗試找出哪個索引的mapping有Object類型的字段,但結(jié)果沒能找到。
最后,因為這個集群的索引較少,我們通過簡單暴力的方法——二分查找停掉作業(yè)觀察集群狀態(tài),來找到問題索引。
問題排查
問題一
為什么會發(fā)生Stackoverflow?
棧溢出的堆棧發(fā)生在ES服務端處理客戶端的寫入請求時,在開啟dynamic mapping的情況下,如果寫入數(shù)據(jù)包含新的字段配置,需要解析字段配置,解析字段配置的邏輯是遞歸解析配置對應的JSON數(shù)據(jù),當字段類型為嵌套格式(Object/nested)時,遞歸的次數(shù)取決于用戶數(shù)據(jù)的嵌套層數(shù)。問題索引的數(shù)據(jù)嵌套層數(shù)過多導致,遞歸次數(shù)過多,進而導致棧溢出。
驗證:
測試寫入一條多層嵌套的數(shù)據(jù),結(jié)果中的代碼堆棧和現(xiàn)象中發(fā)生StackOverflowError的棧相同,出現(xiàn)了多次遞歸
{
"o1":{
"a":{
"b":{
"c":{
"d":{
"e":{
"f":{
"g":{
"h":{
"j":"ddd"
}
}
}
}
}
}
}
}
}
}
代碼堆棧:
查看問題索引確實開啟了dynamic mapping,并且原始日志確實存在包含大量嵌套結(jié)構(gòu)的數(shù)據(jù)
問題二
為什么問題索引的mapping中不包含Object類型的字段?
異常堆棧的觸發(fā)時機為數(shù)據(jù)寫入解析mapping,此時還未將新的mapping更新為索引的mapping,由于解析mapping時發(fā)生了Stackoverflow導致ES進程crash,因此索引mapping沒有更新,自然問題索引的mapping中不包含Object類型的字段。
問題三
ES側(cè)有nested字段的深度限制(index.mapping.depth.limit),為什么沒攔截掉該消息?
該檢查在解析字段配置之后,解析字段時就發(fā)生了棧溢出,詳見下面的代碼文章來源:http://www.zghlxwxcb.cn/news/detail-775774.html
private synchronized Map<String, DocumentMapper> internalMerge(Map<String, CompressedXContent> mappings, MergeReason reason) {
//...省略無關代碼...
try {
documentMapper =
documentParser.parse(type, entry.getValue(), applyDefault ? defaultMappingSourceOrLastStored : null); // 數(shù)據(jù)的mapping解析
} catch (Exception e) {
throw new MapperParsingException("Failed to parse mapping [{}]: {}", e, entry.getKey(), e.getMessage());
}
}
return internalMerge(defaultMapper, defaultMappingSource, documentMapper, reason);// 這里會檢查mapping
}
private synchronized Map<String, DocumentMapper> internalMerge(@Nullable DocumentMapper defaultMapper,
@Nullable String defaultMappingSource, DocumentMapper mapper,
MergeReason reason) {
//...省略無關代碼...
boolean hasNested = this.hasNested;
Map<String, ObjectMapper> fullPathObjectMappers = this.fullPathObjectMappers;
Map<String, DocumentMapper> results = new LinkedHashMap<>(2);
if (defaultMapper != null) {
if (indexSettings.getIndexVersionCreated().onOrAfter(Version.V_7_0_0)) {
throw new IllegalArgumentException(DEFAULT_MAPPING_ERROR_MESSAGE);
} else if (reason == MergeReason.MAPPING_UPDATE) { // only log in case of explicit mapping updates
deprecationLogger.deprecatedAndMaybeLog("default_mapping_not_allowed", DEFAULT_MAPPING_ERROR_MESSAGE);
}
assert defaultMapper.type().equals(DEFAULT_MAPPING);
results.put(DEFAULT_MAPPING, defaultMapper);
}
for (ObjectMapper objectMapper : objectMappers) {
if (reason != MergeReason.MAPPING_RECOVERY) {
checkTotalFieldsLimit(objectMappers.size() + fieldMappers.size() - metadataMappers.length
+ fieldAliasMappers.size());
checkFieldNameSoftLimit(objectMappers, fieldMappers, fieldAliasMappers);
checkNestedFieldsLimit(fullPathObjectMappers);
checkDepthLimit(fullPathObjectMappers.keySet()); // 檢查mapping的最大深度是打破閾值,是則拋出IllegalArgumentException
}
results.put(newMapper.type(), newMapper);
}
return results;
}
解決方法
官方社區(qū)在v8.6修復了該問題,https://github.com/elastic/elasticsearch/issues/52098,我們使用的版本是ES7,需要升級或者打patch才能解決文章來源地址http://www.zghlxwxcb.cn/news/detail-775774.html
生產(chǎn)環(huán)境建議
- 最好不好開啟dynamic mapping功能,不僅影響性能,低版本還可能出現(xiàn)本文的問題
- 故障處理時可以考慮臨時增加日志,輔助問題排查。像這次問題如果在mapping解析的部分加上索引名或者字段信息輔助找到問題索引,故障時間將大幅縮短
- 版本迭代最好跟上社區(qū),很多問題社區(qū)都解決了
- 該問題排查還可以考慮開啟Transport tracer,打印出寫入請求日志,看看發(fā)生棧溢出之前的寫入的索引數(shù)據(jù)情況
到了這里,關于【ElasticSearch】索引數(shù)據(jù)mapping嵌套深度過大導致Stackoverflow問題排查的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關文章,希望大家以后多多支持TOY模板網(wǎng)!