一、簡介
Kubernetes 默認(rèn)情況下使用 cAdvisor 來收集容器的各項指標(biāo),足以滿足大多數(shù)人的需求,但還是有所欠缺,比如缺少對以下幾個指標(biāo)的收集:
-
OOM kill
-
容器重啟的次數(shù)
-
容器的退出碼
missing-container-metrics 這個項目彌補了 cAdvisor 的缺陷,新增了以上幾個指標(biāo),集群管理員可以利用這些指標(biāo)迅速定位某些故障。例如,假設(shè)某個容器有多個子進(jìn)程,其中某個子進(jìn)程被 OOM kill,但容器還在運行,如果不對 OOM kill 進(jìn)行監(jiān)控,管理員很難對故障進(jìn)行定位。
二、安裝
官方提供了helm chart方式來進(jìn)行安裝,我們先添加helm倉庫
helm?repo?add?missing-container-metrics?https://draganm.github.io/missing-container-metrics
把這個chart下載到本地,我們需要修改value.yaml文件
[root@master-01 addons]# helm pull missing-container-metrics/missing-container-metrics
[root@master-01 addons]# ls
blackbox dingtalk harbor_exporter mysql-exporter prometheusalert rules servicemonitor victoriametrics
blackbox-probe etcd missing-container-metrics-0.1.1.tgz process-exporter redis-exporter scheduler-controller-svc.yaml ssl-exporter
[root@master-01 addons]# tar xf missing-container-metrics-0.1.1.tgz
可配置項
Parameter | Description | Default |
---|---|---|
image.repository | 鏡像名稱 | dmilhdef/missing-container-metrics |
image.pullPolicy | 鏡像拉取策略 | IfNotPresent |
image.tag | 鏡像tag | v0.21.0 |
imagePullSecrets | 拉取鏡像的secret | [] |
nameOverride | 覆蓋生成的圖表名稱。默認(rèn)為 .Chart.Name。 | |
fullnameOverride | 覆蓋生成的版本名稱。默認(rèn)為 .Release.Name。 | |
podAnnotations | Pod 的Annotations | {"prometheus.io/scrape": "true", "prometheus.io/port": "3001"} |
podSecurityContext | 為 pod 設(shè)置安全上下文 | |
securityContext | 為 pod 中的容器設(shè)置安全上下文 | |
resources | PU/內(nèi)存資源請求/限制 | {} |
useDocker | 從 Docker 獲取容器信息,如果容器運行時為docker ,設(shè)置為true | false |
useContainerd | 從 Containerd 獲取容器信息,如果容器運行時為containers ,設(shè)置為true | true |
我們這里修改missing-container-metrics/values.yaml
中``useDocker為
true`,然后安裝
[root@master-01 addons]# kubectl create namespace missing-container-metrics
namespace/missing-container-metrics created
[root@master-01 addons]# helm install missing-container-metrics missing-container-metrics -n missing-container-metrics
NAME: missing-container-metrics
LAST DEPLOYED: Tue Jul 6 10:47:35 2021
NAMESPACE: missing-container-metrics
STATUS: deployed
REVISION: 1
TEST SUITE: None
[root@master-01 addons]# helm -n missing-container-metrics list
NAME NAMESPACE REVISION UPDATED STATUS CHART APP VERSION
missing-container-metrics missing-container-metrics 1 2021-07-06 10:47:35.261058822 +0800 CST deployed missing-container-metrics-0.1.1 0.21.0
##因為我只有一個節(jié)點,所以這里daemonset 就只有一個pod
[root@master-01 addons]# kubectl get pod -n missing-container-metrics
NAME READY STATUS RESTARTS AGE
missing-container-metrics-s9cgk 1/1 Running 0 115s
我們可以訪問服務(wù)的3001端口查看metrics,例如
[root@master-01 addons]# curl 100.67.79.150:3001/metrics
# HELP container_last_exit_code Last exit code of the container
# TYPE container_last_exit_code gauge
container_last_exit_code{container_id="docker://0133fb5d739ba98b3985bdc7766fa200334bbbf29de9a61f98a463ec00de53de",container_short_id="0133fb5d739b",docker_container_id="0133fb5d739ba98b3985bdc7766fa200334bbbf29de9a61f98a463ec00de53de",image_id="docker-pullable://k8s.gcr.io/pause:3.2",name="k8s_POD_dns-autoscaler-565bf94d6c-dc6v4_kube-system_96437fe8-200c-4845-a7cc-a27790c6c5a7_0",namespace="kube-system",pod="dns-autoscaler-565bf94d6c-dc6v4"} 0
container_last_exit_code{container_id="docker://0388ba15b0181fead17cfc3606a57aeef0a9b8b73cf3f97eb901565c8aa1702c",container_short_id="0388ba15b018",docker_container_id="0388ba15b0181fead17cfc3606a57aeef0a9b8b73cf3f97eb901565c8aa1702c",image_id="docker-pullable://sha256:e20d2ec0d0ed8ffd693b435af9f2943095a608440e3b845331d6d00344025455",name="k8s_victoriametrics_victoriametrics-0_kube-system_7b381d2c-791b-4e38-8cbb-43485afcb285_0",namespace="kube-system",pod="victoriametrics-0"} 0
container_last_exit_code{container_id="docker://0400f7e29dab47304f97669cb52b5c7c9310fbb5c156c07d0dc9bfca6b8ee14d",container_short_id="0400f7e29dab",docker_container_id="0400f7e29dab47304f97669cb52b5c7c9310fbb5c156c07d0dc9bfca6b8ee14d",image_id="docker-pullable://k8s.gcr.io/pause:3.2",name="k8s_POD_csi-resizer-f6d66495f-s4vkv_longhorn-system_282278da-2638-4e26-8411-802bf57c1ed8_0",namespace="longhorn-system",pod="csi-resizer-f6d66495f-s4vkv"} 0
container_last_exit_code{container_id="docker://04e2c60777ce277c62c7137f1d7b40d9c1523bb3edf9127efd357590f39ba79c",container_short_id="04e2c60777ce",docker_container_id="04e2c60777ce277c62c7137f1d7b40d9c1523bb3edf9127efd357590f39ba79c",image_id="docker-pullable://k8s.gcr.io/pause:3.2",name="k8s_POD_kube-state-metrics-859b6bf99-q8tdf_monitoring_529aa188-f7a0-4b5c-9608-cd8fc473ac8c_2",namespace="monitoring",pod="kube-state-metrics-859b6bf99-q8tdf"} 0
服務(wù)公開了如下的指標(biāo):
-
container_restarts
?:容器的重啟次數(shù)。 -
container_ooms
?:容器的 OOM 殺死數(shù)。這涵蓋了容器 cgroup 中任何進(jìn)程的 OOM 終止。 -
container_last_exit_code
?:容器的最后退出代碼。
每一個指標(biāo)包含如下標(biāo)簽:
-
docker_container_id
:容器的完整 ID。 -
container_short_id
:Docker 容器 ID 的前 6 個字節(jié)。 -
container_id
:容器 id 以與 kubernetes pod 指標(biāo)相同的格式表示 - 以容器運行時為前綴docker://
并containerd://
取決于容器運行時。這使得 Prometheus 中的kube_pod_container_info
指標(biāo)可以輕松連接。 -
name
:容器的名稱。 -
image_id
:圖像 id 以與 k8s pod 的指標(biāo)相同的格式表示。這使得 Prometheus 中的kube_pod_container_info
指標(biāo)可以輕松連接。 -
pod
:如果io.kubernetes.pod.name
在容器上設(shè)置了pod
標(biāo)簽,則其值將設(shè)置為指標(biāo)中的標(biāo)簽 -
namespace
:如果io.kubernetes.pod.namespace
容器上設(shè)置了namespace
標(biāo)簽,則其值將設(shè)置為指標(biāo)的標(biāo)簽。
三、添加PodMonitor 和 PrometheusRule(針對Prometheus Operator)
在template目錄下創(chuàng)建文件podmonitor.yaml
{{ if .Values.prometheusOperator.podMonitor.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: PodMonitor
metadata:
name: {{ include "missing-container-metrics.fullname" . }}
{{- with .Values.prometheusOperator.podMonitor.namespace }}
namespace: {{ . }}
{{- end }}
labels:
{{- include "missing-container-metrics.labels" . | nindent 4 }}
{{- with .Values.prometheusOperator.podMonitor.selector }}
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
selector:
matchLabels:
{{- include "missing-container-metrics.selectorLabels" . | nindent 6 }}
podMetricsEndpoints:
- port: http
namespaceSelector:
matchNames:
- {{ .Release.Namespace }}
{{ end }}
在template目錄下創(chuàng)建文件prometheusrule.yaml
{{ if .Values.prometheusOperator.prometheusRule.enabled }}
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: {{ include "missing-container-metrics.fullname" . }}
{{- with .Values.prometheusOperator.prometheusRule.namespace }}
namespace: {{ . }}
{{- end }}
labels:
{{- include "missing-container-metrics.labels" . | nindent 4 }}
{{- with .Values.prometheusOperator.prometheusRule.selector }}
{{- toYaml . | nindent 4 }}
{{- end }}
spec:
groups:
- name: {{ include "missing-container-metrics.fullname" . }}
rules:
{{- toYaml .Values.prometheusOperator.prometheusRule.rules | nindent 6 }}
{{ end }}
修改value.yaml
,添加如下
useDocker: true
useContainerd: false
###添加
prometheusOperator:
podMonitor:
# Create a Prometheus Operator PodMonitor resource
enabled: true
# Namespace defaults to the Release namespace but can be overridden
namespace: ""
# Additional labels to add to the PodMonitor so it matches the Operator's podMonitorSelector
selector:
app.kubernetes.io/name: missing-container-metrics
prometheusRule:
# Create a Prometheus Operator PrometheusRule resource
enabled: true
# Namespace defaults to the Release namespace but can be overridden
namespace: ""
# Additional labels to add to the PrometheusRule so it matches the Operator's ruleSelector
selector:
prometheus: k8s
role: alert-rules
# The rules can be set here. An example is defined here but can be overridden.
rules:
- alert: ContainerOOMObserved
annotations:
message: A process in this Pod has been OOMKilled due to exceeding the Kubernetes memory limit at least twice in the last 15 minutes. Look at the metrics to determine if a memory limit increase is required.
expr: sum(increase(container_ooms[15m])) by (exported_namespace, exported_pod) > 2
labels:
severity: warning
- alert: ContainerOOMObserved
annotations:
message: A process in this Pod has been OOMKilled due to exceeding the Kubernetes memory limit at least ten times in the last 15 minutes. Look at the metrics to determine if a memory limit increase is required.
expr: sum(increase(container_ooms[15m])) by (exported_namespace, exported_pod) > 10
labels:
severity: critical
使用下面命令更新
[root@master-01 addons]# helm upgrade missing-container-metrics -n missing-container-metrics missing-container-metrics/
Release "missing-container-metrics" has been upgraded. Happy Helming!
NAME: missing-container-metrics
LAST DEPLOYED: Tue Jul 6 11:36:02 2021
NAMESPACE: missing-container-metrics
STATUS: deployed
REVISION: 2
TEST SUITE: None
更新后會創(chuàng)建podmonitor和prometeusrules
[root@master-01 addons]# kubectl get prometheusrules.monitoring.coreos.com -n missing-container-metrics
NAME AGE
missing-container-metrics 15s
[root@master-01 addons]# kubectl get podmonitors.monitoring.coreos.com -n missing-container-metrics
NAME AGE
missing-container-metrics 35s
我們可以在prometheus ui 上看到相關(guān)target和rules文章來源:http://www.zghlxwxcb.cn/news/detail-475031.html
文章來源地址http://www.zghlxwxcb.cn/news/detail-475031.html
到了這里,關(guān)于prometheus使用missing-container-metrics監(jiān)控pod的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!