前置條件:prometheus安裝完成,創(chuàng)建釘釘群機(jī)器人,我這里使用的是指定ip的方式
定義告警規(guī)則
修改Prometheus配置文件prometheus.yml,添加以下配置:
rule_files:
- /usr/local/prometheus/rules/*.rules
alerting:
alertmanagers:
- static_configs:
- targets:
- localhost:9093
在目錄/usr/local/prometheus/rules/下創(chuàng)建告警文件hoststats-alert.rules內(nèi)容如下:
groups:
- name: hostStatsAlert
rules:
- alert: hostCpuUsageAlert
expr: sum by (instance) (avg without (cpu) (irate(node_cpu_seconds_total{mode!="idle"}[5m]))) > 0.5
for: 1m
labels:
# 嚴(yán)重性
severity: warning
annotations:
title: cpu飚高告警
summary: "Instance {{ $labels.instance }} CPU usgae high"
description: "{{ $labels.instance }} CPU usage above 50% (current value: {{ $value }})"
- alert: hostMemUsageAlert
expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes)/node_memory_MemTotal_bytes > 0.85
for: 1m
labels:
severity: warning
annotations:
title: 內(nèi)存使用率飚高告警
summary: "Instance {{ $labels.instance }} MEM usgae high"
description: "{{ $labels.instance }} MEM usage above 85% (current value: {{ $value }})"
重啟Prometheus后訪問(wèn)Prometheus http://127.0.0.1:9090/rules可以查看當(dāng)前以加載的規(guī)則文件。
安裝配置prometheus-webhook-dingtalk
wget https://github.com/timonwong/prometheus-webhook-dingtalk/releases/download/v2.1.0/prometheus-webhook-dingtalk-2.1.0.linux-amd64.tar.gz
tar -zxvf prometheus-webhook-dingtalk-2.1.0.linux-amd64.tar.gz -C /usr/local
mv /usr/local/prometheus-webhook-dingtalk-2.1.0.linux-amd64 /usr/local/prometheus-webhook-dingtalk
cp /usr/local/prometheus-webhook-dingtalk/config.example.yml /usr/local/prometheus-webhook-dingtalk/config.yml
vim config.yml # 將配置文件修改成下面這樣
## Request timeout
# timeout: 5s
## Uncomment following line in order to write template from scratch (be careful!)
#no_builtin_template: true
## Customizable templates path
templates:
- contrib/templates/mytemplate.tmpl # 這里指向你生成的模板
## You can also override default template using `default_message`
## The following example to use the 'legacy' template from v0.3.0
#default_message:
# title: '{{ template "legacy.title" . }}'
# text: '{{ template "legacy.content" . }}'
## Targets, previously was known as "profiles"
targets:
webhook1:
# 釘釘機(jī)器人的webhook, 是從釘釘機(jī)器人中獲取的值
url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
# secret for signature 加簽后得到的值, 機(jī)器人的加簽
# secret: xxxxxxxxxxxxxxxxxxxxxxxxxxxxx
# webhook2:
# url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
# webhook_legacy:
# url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
# # Customize template content
# message:
# # Use legacy template
# title: '{{ template "legacy.title" . }}'
# text: '{{ template "legacy.content" . }}'
# webhook_mention_all:
# url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
# mention:
# all: true
# webhook_mention_users:
# url: https://oapi.dingtalk.com/robot/send?access_token=xxxxxxxxxxxx
# mention:
# mobiles: ['156xxxx8827', '189xxxx8325']
# 添加如下模板,模板中需要有prometheus添加的 Annotations中需要title、description;Labels中需要有severity
vim /usr/local/prometheus-webhook-dingtalk/contrib/templates/mytemplate.tmpl
cd /usr/local/prometheus-webhook-dingtalk/
./prometheus-webhook-dingtalk --config.file=config.yml >dingtalk.log 2>&1 &
{{ define "__subject" }}
[{{ .Status | toUpper }}{{ if eq .Status "firing" }}:{{ .Alerts.Firing | len }}{{ end }}]
{{ end }}
{{ define "__alert_list" }}{{ range . }}
---
{{ if .Labels.owner }}@{{ .Labels.owner }}{{ end }}
**告警名稱**: {{ index .Annotations "title" }}
**告警級(jí)別**: {{ .Labels.severity }}
**告警主機(jī)**: {{ .Labels.instance }}
**告警信息**: {{ index .Annotations "description" }}
**告警時(shí)間**: {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}
{{ end }}{{ end }}
{{ define "__resolved_list" }}{{ range . }}
---
{{ if .Labels.owner }}@{{ .Labels.owner }}{{ end }}
**告警名稱**: {{ index .Annotations "title" }}
**告警級(jí)別**: {{ .Labels.severity }}
**告警主機(jī)**: {{ .Labels.instance }}
**告警信息**: {{ index .Annotations "description" }}
**告警時(shí)間**: {{ dateInZone "2006.01.02 15:04:05" (.StartsAt) "Asia/Shanghai" }}
**恢復(fù)時(shí)間**: {{ dateInZone "2006.01.02 15:04:05" (.EndsAt) "Asia/Shanghai" }}
{{ end }}{{ end }}
{{ define "default.title" }}
{{ template "__subject" . }}
{{ end }}
{{ define "default.content" }}
{{ if gt (len .Alerts.Firing) 0 }}
**====偵測(cè)到{{ .Alerts.Firing | len }}個(gè)故障====**
{{ template "__alert_list" .Alerts.Firing }}
---
{{ end }}
{{ if gt (len .Alerts.Resolved) 0 }}
**====恢復(fù){{ .Alerts.Resolved | len }}個(gè)故障====**
{{ template "__resolved_list" .Alerts.Resolved }}
{{ end }}
{{ end }}
{{ define "ding.link.title" }}{{ template "default.title" . }}{{ end }}
{{ define "ding.link.content" }}{{ template "default.content" . }}{{ end }}
{{ template "default.title" . }}
{{ template "default.content" . }}
安裝配置prometheus-alertmanager
wget https://github.com/prometheus/alertmanager/releases/download/v0.25.0/alertmanager-0.25.0.linux-amd64.tar.gz
tar -zxvf alertmanager-0.25.0.linux-amd64.tar.gz
mv alertmanager-0.25.0.linux-amd64 /usr/local/alertmanager
# 修改告警管理的配置文件如下
vim /usr/local/alertmanager/alertmanager.yml
cd /usr/local/alertmanager/
./alertmanager --config.file=alertmanager.yml >alertmanager.log 2>&1 &
global:
#每一分鐘檢查一次是否恢復(fù)
resolve_timeout: 5m
route:
#采用哪個(gè)標(biāo)簽來(lái)作為分組依據(jù)
group_by: ['alertname']
#組告警等待時(shí)間。也就是告警產(chǎn)生后等待10s,如果有同組告警一起發(fā)出
group_wait: 10s
#兩組告警的間隔時(shí)間
group_interval: 1m
#重復(fù)告警的間隔時(shí)間,減少相同告警的發(fā)送頻率
repeat_interval: 1m
#設(shè)置默認(rèn)接收人
receiver: 'web.hook'
routes:
- receiver: 'dingding.webhook1'
match_re:
alertname: ".*"
receivers:
- name: 'web.hook'
webhook_configs:
- url: 'http://127.0.0.1:5001/'
- name: 'dingding.webhook1'
webhook_configs:
# 這里的webhook1,根據(jù)我們?cè)卺斸敻婢寮渲梦募衪argets中指定的值做修改
- url: 'http://127.0.0.1:8060/dingtalk/webhook1/send'
send_resolved: true
inhibit_rules:
- source_match:
severity: 'critical'
target_match:
severity: 'warning'
equal: ['alertname', 'dev', 'instance']
此時(shí),我們可以手動(dòng)拉高系統(tǒng)的CPU使用率,驗(yàn)證Prometheus的告警流程,在主機(jī)上運(yùn)行以下命令:
cat /dev/zero>/dev/null
Prometheus首次檢測(cè)到滿足觸發(fā)條件后,hostCpuUsageAlert顯示由一條告警處于活動(dòng)狀態(tài)。由于告警規(guī)則中設(shè)置了1m的等待時(shí)間,當(dāng)前告警狀態(tài)為PENDING,可在下圖位置可見(jiàn)
等待告警狀態(tài)為firing后釘釘群機(jī)器人會(huì)發(fā)出告警信息文章來(lái)源:http://www.zghlxwxcb.cn/news/detail-507297.html
springboot應(yīng)用埋點(diǎn)在下篇文章
文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-507297.html
到了這里,關(guān)于prometheus告警發(fā)送到釘釘群機(jī)器人的全部署流程的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!