国产 无码 综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

Prometheus接入AlterManager配置企業(yè)微信告警(基于K8S環(huán)境部署)

這篇具有很好參考價(jià)值的文章主要介紹了Prometheus接入AlterManager配置企業(yè)微信告警(基于K8S環(huán)境部署)。希望對大家有所幫助。如果存在錯(cuò)誤或未考慮完全的地方,請大家不吝賜教,您也可以點(diǎn)擊"舉報(bào)違法"按鈕提交疑問。


注意:請基于 Prometheus+Grafana監(jiān)控K8S集群(基于K8S環(huán)境部署)文章之上做本次實(shí)驗(yàn)。

一、創(chuàng)建企業(yè)微信機(jī)器人

1、創(chuàng)建企業(yè)微信機(jī)器人

點(diǎn)擊登入企業(yè)微信網(wǎng)頁版:

應(yīng)用管理 > 機(jī)器人 > 創(chuàng)建應(yīng)用

k8s prometheus 企微告警,# 4-Prometheus監(jiān)控系統(tǒng)(K8S),# 1-Prometheus監(jiān)控系統(tǒng),# 3-AlterManager告警通知工具,kubernetes,prometheus,企業(yè)微信

創(chuàng)建好之后如上圖,我們獲取 點(diǎn)擊查看獲取 Secret 值。

2、獲取企業(yè)ID

k8s prometheus 企微告警,# 4-Prometheus監(jiān)控系統(tǒng)(K8S),# 1-Prometheus監(jiān)控系統(tǒng),# 3-AlterManager告警通知工具,kubernetes,prometheus,企業(yè)微信

二、配置AlterManager告警發(fā)送至企業(yè)微信

1、創(chuàng)建AlterManager ConfigMap資源清單

vim alertmanager-cm.yaml
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: alertmanager
  namespace: prometheus
data:
  alertmanager.yml: |-
    templates:
      - '/alertmanager/template/WeChat.tmpl'
    global:
      resolve_timeout: 1m
      smtp_smarthost: 'smtp.163.com:25'
      smtp_from: '18145536045@163.com'
      smtp_auth_username: '18145536045@163.com'
      smtp_auth_password: 'KCGZFUDCCKMNZMKB'
      smtp_require_tls: false
    route:
      group_by: [alertname]
      group_wait: 10s
      group_interval: 10s
      repeat_interval: 10m
      receiver: wechat-001
    receivers:
    - name: 'wechat-001'
      wechat_configs:
      - corp_id: wwfb8d55841e190c10 # 企業(yè)ID
        to_user: '@all'             # 發(fā)送所有人
        agent_id: 1000002           # agentID
        api_secret: wa6kWECFthSpvdhcF-RPgjrIBzUvm-SpqXXXXXXXXXX # secret

執(zhí)行YAML資源清單:

kubectl apply -f alertmanager-cm.yaml

三、Prometheus接入AlterManager配置

1、創(chuàng)建新的Prometheus ConfigMap資源清單,添加監(jiān)控K8S集群告警規(guī)則

vim prometheus-alertmanager-cfg.yaml
---
kind: ConfigMap
apiVersion: v1
metadata:
  labels:
    app: prometheus
  name: prometheus-config
  namespace: prometheus
data:
  prometheus.yml: |
    rule_files: 
    - /etc/prometheus/rules.yml   # 告警規(guī)則位置
    alerting:
      alertmanagers:
      - static_configs:
        - targets: ["localhost:9093"] # 接入AlterManager
    global:
      scrape_interval: 15s
      scrape_timeout: 10s
      evaluation_interval: 1m
    scrape_configs:
    - job_name: 'kubernetes-node'
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - source_labels: [__address__]
        regex: '(.*):10250'
        replacement: '${1}:9100'
        target_label: __address__
        action: replace
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
    - job_name: 'kubernetes-node-cadvisor'
      kubernetes_sd_configs:
      - role:  node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    - job_name: 'kubernetes-apiserver'
      kubernetes_sd_configs:
      - role: endpoints
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https
    - job_name: 'kubernetes-service-endpoints'
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (https?)
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name 
    - job_name: 'kubernetes-pods'    # 監(jiān)控Pod配置,添加注解后才可以被發(fā)現(xiàn)
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - action: keep
        regex: true
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scrape
      - action: replace
        regex: (.+)
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_path
        target_label: __metrics_path__
      - action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        source_labels:
        - __address__
        - __meta_kubernetes_pod_annotation_prometheus_io_port
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: kubernetes_namespace
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_name
        target_label: kubernetes_pod_name
    - job_name: 'kubernetes-etcd'   # 監(jiān)控etcd配置
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/k8s-certs/etcd/ca.crt
        cert_file: /var/run/secrets/kubernetes.io/k8s-certs/etcd/server.crt
        key_file: /var/run/secrets/kubernetes.io/k8s-certs/etcd/server.key
      scrape_interval: 5s
      static_configs:
      - targets: ['16.32.15.200:2379']
  rules.yml: |  # K8S集群告警規(guī)則配置文件
    groups:
    - name: example
      rules:
      - alert: apiserver的cpu使用率大于80%
        expr: rate(process_cpu_seconds_total{job=~"kubernetes-apiserver"}[1m]) * 100 > 80
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}組件的cpu使用率超過80%"
      - alert:  apiserver的cpu使用率大于90%
        expr: rate(process_cpu_seconds_total{job=~"kubernetes-apiserver"}[1m]) * 100 > 90
        for: 2s
        labels:
          severity: critical
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}組件的cpu使用率超過90%"
      - alert: etcd的cpu使用率大于80%
        expr: rate(process_cpu_seconds_total{job=~"kubernetes-etcd"}[1m]) * 100 > 80
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}組件的cpu使用率超過80%"
      - alert:  etcd的cpu使用率大于90%
        expr: rate(process_cpu_seconds_total{job=~"kubernetes-etcd"}[1m]) * 100 > 90
        for: 2s
        labels:
          severity: critical
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}組件的cpu使用率超過90%"
      - alert: kube-state-metrics的cpu使用率大于80%
        expr: rate(process_cpu_seconds_total{k8s_app=~"kube-state-metrics"}[1m]) * 100 > 80
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "{{$labels.instance}}的{{$labels.k8s_app}}組件的cpu使用率超過80%"
          value: "{{ $value }}%"
          threshold: "80%"      
      - alert: kube-state-metrics的cpu使用率大于90%
        expr: rate(process_cpu_seconds_total{k8s_app=~"kube-state-metrics"}[1m]) * 100 > 0
        for: 2s
        labels:
          severity: critical
        annotations:
          description: "{{$labels.instance}}的{{$labels.k8s_app}}組件的cpu使用率超過90%"
          value: "{{ $value }}%"
          threshold: "90%"      
      - alert: coredns的cpu使用率大于80%
        expr: rate(process_cpu_seconds_total{k8s_app=~"kube-dns"}[1m]) * 100 > 80
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "{{$labels.instance}}的{{$labels.k8s_app}}組件的cpu使用率超過80%"
          value: "{{ $value }}%"
          threshold: "80%"      
      - alert: coredns的cpu使用率大于90%
        expr: rate(process_cpu_seconds_total{k8s_app=~"kube-dns"}[1m]) * 100 > 90
        for: 2s
        labels:
          severity: critical
        annotations:
          description: "{{$labels.instance}}的{{$labels.k8s_app}}組件的cpu使用率超過90%"
          value: "{{ $value }}%"
          threshold: "90%"      
      - alert: kube-proxy打開句柄數(shù)>600
        expr: process_open_fds{job=~"kubernetes-kube-proxy"}  > 600
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}打開句柄數(shù)>600"
          value: "{{ $value }}"
      - alert: kube-proxy打開句柄數(shù)>1000
        expr: process_open_fds{job=~"kubernetes-kube-proxy"}  > 1000
        for: 2s
        labels:
          severity: critical
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}打開句柄數(shù)>1000"
          value: "{{ $value }}"
      - alert: kubernetes-schedule打開句柄數(shù)>600
        expr: process_open_fds{job=~"kubernetes-schedule"}  > 600
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}打開句柄數(shù)>600"
          value: "{{ $value }}"
      - alert: kubernetes-schedule打開句柄數(shù)>1000
        expr: process_open_fds{job=~"kubernetes-schedule"}  > 1000
        for: 2s
        labels:
          severity: critical
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}打開句柄數(shù)>1000"
          value: "{{ $value }}"
      - alert: kubernetes-controller-manager打開句柄數(shù)>600
        expr: process_open_fds{job=~"kubernetes-controller-manager"}  > 600
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}打開句柄數(shù)>600"
          value: "{{ $value }}"
      - alert: kubernetes-controller-manager打開句柄數(shù)>1000
        expr: process_open_fds{job=~"kubernetes-controller-manager"}  > 1000
        for: 2s
        labels:
          severity: critical
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}打開句柄數(shù)>1000"
          value: "{{ $value }}"
      - alert: kubernetes-apiserver打開句柄數(shù)>600
        expr: process_open_fds{job=~"kubernetes-apiserver"}  > 600
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}打開句柄數(shù)>600"
          value: "{{ $value }}"
      - alert: kubernetes-apiserver打開句柄數(shù)>1000
        expr: process_open_fds{job=~"kubernetes-apiserver"}  > 1000
        for: 2s
        labels:
          severity: critical
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}打開句柄數(shù)>1000"
          value: "{{ $value }}"
      - alert: kubernetes-etcd打開句柄數(shù)>600
        expr: process_open_fds{job=~"kubernetes-etcd"}  > 600
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}打開句柄數(shù)>600"
          value: "{{ $value }}"
      - alert: kubernetes-etcd打開句柄數(shù)>1000
        expr: process_open_fds{job=~"kubernetes-etcd"}  > 1000
        for: 2s
        labels:
          severity: critical
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}打開句柄數(shù)>1000"
          value: "{{ $value }}"
      - alert: coredns
        expr: process_open_fds{k8s_app=~"kube-dns"}  > 600
        for: 2s
        labels:
          severity: warnning 
        annotations:
          description: "插件{{$labels.k8s_app}}({{$labels.instance}}): 打開句柄數(shù)超過600"
          value: "{{ $value }}"
      - alert: coredns
        expr: process_open_fds{k8s_app=~"kube-dns"}  > 1000
        for: 2s
        labels:
          severity: critical
        annotations:
          description: "插件{{$labels.k8s_app}}({{$labels.instance}}): 打開句柄數(shù)超過1000"
          value: "{{ $value }}"
      - alert: kube-proxy
        expr: process_virtual_memory_bytes{job=~"kubernetes-kube-proxy"}  > 2000000000
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "組件{{$labels.job}}({{$labels.instance}}): 使用虛擬內(nèi)存超過2G"
          value: "{{ $value }}"
      - alert: scheduler
        expr: process_virtual_memory_bytes{job=~"kubernetes-schedule"}  > 2000000000
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "組件{{$labels.job}}({{$labels.instance}}): 使用虛擬內(nèi)存超過2G"
          value: "{{ $value }}"
      - alert: kubernetes-controller-manager
        expr: process_virtual_memory_bytes{job=~"kubernetes-controller-manager"}  > 2000000000
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "組件{{$labels.job}}({{$labels.instance}}): 使用虛擬內(nèi)存超過2G"
          value: "{{ $value }}"
      - alert: kubernetes-apiserver
        expr: process_virtual_memory_bytes{job=~"kubernetes-apiserver"}  > 2000000000
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "組件{{$labels.job}}({{$labels.instance}}): 使用虛擬內(nèi)存超過2G"
          value: "{{ $value }}"
      - alert: kubernetes-etcd
        expr: process_virtual_memory_bytes{job=~"kubernetes-etcd"}  > 2000000000
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "組件{{$labels.job}}({{$labels.instance}}): 使用虛擬內(nèi)存超過2G"
          value: "{{ $value }}"
      - alert: kube-dns
        expr: process_virtual_memory_bytes{k8s_app=~"kube-dns"}  > 2000000000
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "插件{{$labels.k8s_app}}({{$labels.instance}}): 使用虛擬內(nèi)存超過2G"
          value: "{{ $value }}"
      - alert: HttpRequestsAvg
        expr: sum(rate(rest_client_requests_total{job=~"kubernetes-kube-proxy|kubernetes-kubelet|kubernetes-schedule|kubernetes-control-manager|kubernetes-apiservers"}[1m]))  > 1000
        for: 2s
        labels:
          team: admin
        annotations:
          description: "組件{{$labels.job}}({{$labels.instance}}): TPS超過1000"
          value: "{{ $value }}"
          threshold: "1000"   
      - alert: Pod_restarts
        expr: kube_pod_container_status_restarts_total{namespace=~"kube-system|default|monitor-sa"} > 0
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "在{{$labels.namespace}}名稱空間下發(fā)現(xiàn){{$labels.pod}}這個(gè)pod下的容器{{$labels.container}}被重啟,這個(gè)監(jiān)控指標(biāo)是由{{$labels.instance}}采集的"
          value: "{{ $value }}"
          threshold: "0"
      - alert: Pod_waiting
        expr: kube_pod_container_status_waiting_reason{namespace=~"kube-system|default"} == 1
        for: 2s
        labels:
          team: admin
        annotations:
          description: "空間{{$labels.namespace}}({{$labels.instance}}): 發(fā)現(xiàn){{$labels.pod}}下的{{$labels.container}}啟動(dòng)異常等待中"
          value: "{{ $value }}"
          threshold: "1"   
      - alert: Pod_terminated
        expr: kube_pod_container_status_terminated_reason{namespace=~"kube-system|default|monitor-sa"} == 1
        for: 2s
        labels:
          team: admin
        annotations:
          description: "空間{{$labels.namespace}}({{$labels.instance}}): 發(fā)現(xiàn){{$labels.pod}}下的{{$labels.container}}被刪除"
          value: "{{ $value }}"
          threshold: "1"
      - alert: Etcd_leader
        expr: etcd_server_has_leader{job="kubernetes-etcd"} == 0
        for: 2s
        labels:
          team: admin
        annotations:
          description: "組件{{$labels.job}}({{$labels.instance}}): 當(dāng)前沒有l(wèi)eader"
          value: "{{ $value }}"
          threshold: "0"
      - alert: Etcd_leader_changes
        expr: rate(etcd_server_leader_changes_seen_total{job="kubernetes-etcd"}[1m]) > 0
        for: 2s
        labels:
          team: admin
        annotations:
          description: "組件{{$labels.job}}({{$labels.instance}}): 當(dāng)前l(fā)eader已發(fā)生改變"
          value: "{{ $value }}"
          threshold: "0"
      - alert: Etcd_failed
        expr: rate(etcd_server_proposals_failed_total{job="kubernetes-etcd"}[1m]) > 0
        for: 2s
        labels:
          team: admin
        annotations:
          description: "組件{{$labels.job}}({{$labels.instance}}): 服務(wù)失敗"
          value: "{{ $value }}"
          threshold: "0"
      - alert: Etcd_db_total_size
        expr: etcd_debugging_mvcc_db_total_size_in_bytes{job="kubernetes-etcd"} > 10000000000
        for: 2s
        labels:
          team: admin
        annotations:
          description: "組件{{$labels.job}}({{$labels.instance}}):db空間超過10G"
          value: "{{ $value }}"
          threshold: "10G"
      - alert: Endpoint_ready
        expr: kube_endpoint_address_not_ready{namespace=~"kube-system|default"} == 1
        for: 2s
        labels:
          team: admin
        annotations:
          description: "空間{{$labels.namespace}}({{$labels.instance}}): 發(fā)現(xiàn){{$labels.endpoint}}不可用"
          value: "{{ $value }}"
          threshold: "1"
    - name: 物理節(jié)點(diǎn)狀態(tài)-監(jiān)控告警
      rules:
      - alert: 物理節(jié)點(diǎn)cpu使用率
        expr: 100-avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by(instance)*100 > 90
        for: 2s
        labels:
          severity: ccritical
        annotations:
          summary: "{{ $labels.instance }}cpu使用率過高"
          description: "{{ $labels.instance }}的cpu使用率超過90%,當(dāng)前使用率[{{ $value }}],需要排查處理" 
      - alert: 物理節(jié)點(diǎn)內(nèi)存使用率
        expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes + node_memory_Buffers_bytes + node_memory_Cached_bytes)) / node_memory_MemTotal_bytes * 100 > 90
        for: 2s
        labels:
          severity: critical
        annotations:
          summary: "{{ $labels.instance }}內(nèi)存使用率過高"
          description: "{{ $labels.instance }}的內(nèi)存使用率超過90%,當(dāng)前使用率[{{ $value }}],需要排查處理"
      - alert: InstanceDown
        expr: up == 0
        for: 2s
        labels:
          severity: critical
        annotations:   
          summary: "{{ $labels.instance }}: 服務(wù)器宕機(jī)"
          description: "{{ $labels.instance }}: 服務(wù)器延時(shí)超過2分鐘"
      - alert: 物理節(jié)點(diǎn)磁盤的IO性能
        expr: 100-(avg(irate(node_disk_io_time_seconds_total[1m])) by(instance)* 100) < 60
        for: 2s
        labels:
          severity: critical
        annotations:
          summary: "{{$labels.mountpoint}} 流入磁盤IO使用率過高!"
          description: "{{$labels.mountpoint }} 流入磁盤IO大于60%(目前使用:{{$value}})"
      - alert: 入網(wǎng)流量帶寬
        expr: ((sum(rate (node_network_receive_bytes_total{device!~'tap.*|veth.*|br.*|docker.*|virbr*|lo*'}[5m])) by (instance)) / 100) > 102400
        for: 2s
        labels:
          severity: critical
        annotations:
          summary: "{{$labels.mountpoint}} 流入網(wǎng)絡(luò)帶寬過高!"
          description: "{{$labels.mountpoint }}流入網(wǎng)絡(luò)帶寬持續(xù)5分鐘高于100M. RX帶寬使用率{{$value}}"
      - alert: 出網(wǎng)流量帶寬
        expr: ((sum(rate (node_network_transmit_bytes_total{device!~'tap.*|veth.*|br.*|docker.*|virbr*|lo*'}[5m])) by (instance)) / 100) > 102400
        for: 2s
        labels:
          severity: critical
        annotations:
          summary: "{{$labels.mountpoint}} 流出網(wǎng)絡(luò)帶寬過高!"
          description: "{{$labels.mountpoint }}流出網(wǎng)絡(luò)帶寬持續(xù)5分鐘高于100M. RX帶寬使用率{{$value}}"
      - alert: TCP會(huì)話
        expr: node_netstat_Tcp_CurrEstab > 1000
        for: 2s
        labels:
          severity: critical
        annotations:
          summary: "{{$labels.mountpoint}} TCP_ESTABLISHED過高!"
          description: "{{$labels.mountpoint }} TCP_ESTABLISHED大于1000%(目前使用:{{$value}}%)"
      - alert: 磁盤容量
        expr: 100-(node_filesystem_free_bytes{fstype=~"ext4|xfs"}/node_filesystem_size_bytes {fstype=~"ext4|xfs"}*100) > 80
        for: 2s
        labels:
          severity: critical
        annotations:
          summary: "{{$labels.mountpoint}} 磁盤分區(qū)使用率過高!"
          description: "{{$labels.mountpoint }} 磁盤分區(qū)使用大于80%(目前使用:{{$value}}%)"

執(zhí)行資源清單:

kubectl apply -f prometheus-alertmanager-cfg.yaml

2、由于在prometheus中新增了etcd,所以生成一個(gè)etcd-certs,這個(gè)在部署prometheus需要

kubectl -n prometheus create secret generic etcd-certs --from-file=/etc/kubernetes/pki/etcd/server.key  --from-file=/etc/kubernetes/pki/etcd/server.crt --from-file=/etc/kubernetes/pki/etcd/ca.crt

四、部署Prometheus+AlterManager(放到一個(gè)Pod中)

1、在node-1節(jié)點(diǎn)創(chuàng)建/data/alertmanager目錄,存放alertmanager數(shù)據(jù)

mkdir /data/alertmanager/template -p
chmod -R 777 /data/alertmanager

2、在node-1節(jié)點(diǎn)創(chuàng)建WeChat報(bào)警模板

vim /data/alertmanager/template/WeChat.tmpl

{{ define "wechat.default.message" }}
{{- if gt (len .Alerts.Firing) 0 -}}
{{- range $index, $alert := .Alerts -}}
{{- if eq $index 0 }}
=========xxx環(huán)境監(jiān)控報(bào)警 =========
告警狀態(tài):{{   .Status }}
告警級(jí)別:{{ .Labels.severity }}
告警類型:{{ $alert.Labels.alertname }}
故障主機(jī): {{ $alert.Labels.instance }} {{ $alert.Labels.pod }}
告警主題: {{ $alert.Annotations.summary }}
告警詳情: {{ $alert.Annotations.message }}{{ $alert.Annotations.description}};
觸發(fā)閥值:{{ .Annotations.value }}
故障時(shí)間: {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
========= = end =  =========
{{- end }}
{{- end }}
{{- end }}
{{- if gt (len .Alerts.Resolved) 0 -}}
{{- range $index, $alert := .Alerts -}}
{{- if eq $index 0 }}
=========xxx環(huán)境異?;謴?fù) =========
告警類型:{{ .Labels.alertname }}
告警狀態(tài):{{   .Status }}
告警主題: {{ $alert.Annotations.summary }}
告警詳情: {{ $alert.Annotations.message }}{{ $alert.Annotations.description}};
故障時(shí)間: {{ ($alert.StartsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
恢復(fù)時(shí)間: {{ ($alert.EndsAt.Add 28800e9).Format "2006-01-02 15:04:05" }}
{{- if gt (len $alert.Labels.instance) 0 }}
實(shí)例信息: {{ $alert.Labels.instance }}
{{- end }}
========= = end =  =========
{{- end }}
{{- end }}
{{- end }}
{{- end }}

3、刪除舊的prometheus deployment資源

kubectl delete deploy prometheus-server -n prometheus

4、創(chuàng)建deployment資源

vim prometheus-alertmanager-deploy.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-server
  namespace: prometheus
  labels:
    app: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
      component: server
    #matchExpressions:
    #- {key: app, operator: In, values: [prometheus]}
    #- {key: component, operator: In, values: [server]}
  template:
    metadata:
      labels:
        app: prometheus
        component: server
      annotations:
        prometheus.io/scrape: 'false'
    spec:
      nodeName: node-1 # 調(diào)度到node-1節(jié)點(diǎn)
      serviceAccountName: prometheus # 指定sa服務(wù)賬號(hào)
      containers:
      - name: prometheus
        image: prom/prometheus:v2.33.5
        imagePullPolicy: IfNotPresent
        command:
        - "/bin/prometheus"
        args:
        - "--config.file=/etc/prometheus/prometheus.yml"
        - "--storage.tsdb.path=/prometheus"
        - "--storage.tsdb.retention=24h"
        - "--web.enable-lifecycle"
        ports:
        - containerPort: 9090
          protocol: TCP
        volumeMounts:
        - mountPath: /etc/prometheus
          name: prometheus-config
        - mountPath: /prometheus/
          name: prometheus-storage-volume
        - name: k8s-certs
          mountPath: /var/run/secrets/kubernetes.io/k8s-certs/etcd/
      - name: alertmanager
        #image: prom/alertmanager:v0.14.0
        image: prom/alertmanager:v0.23.0
        imagePullPolicy: IfNotPresent
        args:
        - "--config.file=/etc/alertmanager/alertmanager.yml"
        - "--log.level=debug"
        ports:
        - containerPort: 9093
          protocol: TCP
          name: alertmanager
        volumeMounts:
        - name: alertmanager-config
          mountPath: /etc/alertmanager
        - name: alertmanager-storage
          mountPath: /alertmanager
        - name: localtime
          mountPath: /etc/localtime
      volumes:
        - name: prometheus-config
          configMap:
            name: prometheus-config
        - name: prometheus-storage-volume
          hostPath:
           path: /data
           type: Directory
        - name: k8s-certs
          secret:
           secretName: etcd-certs
        - name: alertmanager-config
          configMap:
            name: alertmanager
        - name: alertmanager-storage
          hostPath:
           path: /data/alertmanager
           type: DirectoryOrCreate
        - name: localtime
          hostPath:
           path: /usr/share/zoneinfo/Asia/Shanghai

執(zhí)行YAML資源清單:

kubectl apply -f prometheus-alertmanager-deploy.yaml

查看狀態(tài):

kubectl get pods -n prometheus

k8s prometheus 企微告警,# 4-Prometheus監(jiān)控系統(tǒng)(K8S),# 1-Prometheus監(jiān)控系統(tǒng),# 3-AlterManager告警通知工具,kubernetes,prometheus,企業(yè)微信

5、創(chuàng)建AlterManager SVC資源

vim alertmanager-svc.yaml 
---
apiVersion: v1
kind: Service
metadata:
  labels:
    name: prometheus
    kubernetes.io/cluster-service: 'true'
  name: alertmanager
  namespace: prometheus
spec:
  ports:
  - name: alertmanager
    nodePort: 30066
    port: 9093
    protocol: TCP
    targetPort: 9093
  selector:
    app: prometheus
  sessionAffinity: None
  type: NodePort

執(zhí)行YAML資源清單:

kubectl apply -f alertmanager-svc.yaml 

查看狀態(tài):

kubectl get svc -n prometheus

k8s prometheus 企微告警,# 4-Prometheus監(jiān)控系統(tǒng)(K8S),# 1-Prometheus監(jiān)控系統(tǒng),# 3-AlterManager告警通知工具,kubernetes,prometheus,企業(yè)微信

五、測試告警

瀏覽器訪問:http://IP:30066
k8s prometheus 企微告警,# 4-Prometheus監(jiān)控系統(tǒng)(K8S),# 1-Prometheus監(jiān)控系統(tǒng),# 3-AlterManager告警通知工具,kubernetes,prometheus,企業(yè)微信
如上圖可以看到,Prometheus的告警信息已經(jīng)發(fā)到AlterManager了,AlertManager收到報(bào)警數(shù)據(jù)后,會(huì)將警報(bào)信息進(jìn)行分組,然后根據(jù)AlertManager配置的 group_wait 時(shí)間先進(jìn)行等待。等wait時(shí)間過后再發(fā)送報(bào)警信息至企業(yè)微信!

k8s prometheus 企微告警,# 4-Prometheus監(jiān)控系統(tǒng)(K8S),# 1-Prometheus監(jiān)控系統(tǒng),# 3-AlterManager告警通知工具,kubernetes,prometheus,企業(yè)微信

如上圖,告警信息已經(jīng)成功發(fā)往企業(yè)微信了!??!文章來源地址http://www.zghlxwxcb.cn/news/detail-765008.html

到了這里,關(guān)于Prometheus接入AlterManager配置企業(yè)微信告警(基于K8S環(huán)境部署)的文章就介紹完了。如果您還想了解更多內(nèi)容,請?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!

本文來自互聯(lián)網(wǎng)用戶投稿,該文觀點(diǎn)僅代表作者本人,不代表本站立場。本站僅提供信息存儲(chǔ)空間服務(wù),不擁有所有權(quán),不承擔(dān)相關(guān)法律責(zé)任。如若轉(zhuǎn)載,請注明出處: 如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實(shí)不符,請點(diǎn)擊違法舉報(bào)進(jìn)行投訴反饋,一經(jīng)查實(shí),立即刪除!

領(lǐng)支付寶紅包贊助服務(wù)器費(fèi)用

相關(guān)文章

  • Prometheus 告警規(guī)則配置

    Prometheus 告警規(guī)則配置

    alert.rule即告警規(guī)則,在Prometheus中,通過用戶自定義的條件進(jìn)行告警,自定義條件可以由 PromQL 表達(dá)式定義,當(dāng)滿足告警條件后,Prometheus會(huì)通過web界面進(jìn)行告警,如果同時(shí)有部署Alertmanager,則可利用Alertmanager進(jìn)行更為復(fù)雜的通知,如釘釘、微信、飛書等個(gè)性化渠道進(jìn)行通知。

    2023年04月25日
    瀏覽(28)
  • 云原生監(jiān)控系統(tǒng)Prometheus:基于Prometheus構(gòu)建智能化監(jiān)控告警系統(tǒng)

    云原生監(jiān)控系統(tǒng)Prometheus:基于Prometheus構(gòu)建智能化監(jiān)控告警系統(tǒng)

    目錄 一、理論 1.Promethues簡介 2.監(jiān)控告警系統(tǒng)設(shè)計(jì)思路 3.Prometheus監(jiān)控體系 4.Prometheus時(shí)間序列數(shù)據(jù) 5.Prometheus的生態(tài)組件 6.Prometheus工作原理 7.Prometheus監(jiān)控內(nèi)容 8.部署Prometheus 9.部署Exporters 10.部署Grafana進(jìn)行展示 二、實(shí)驗(yàn) 1.部署Prometheus 2.部署Exporters 2.監(jiān)控遠(yuǎn)程MySQL 3.部署Grafana進(jìn)行

    2024年02月07日
    瀏覽(25)
  • 阿里云ACK托管版安裝Prometheus并配置kafka告警

    阿里云ACK托管版安裝Prometheus并配置kafka告警

    前提條件: 1.已有ACK集群。 一、ACK集群中創(chuàng)建prometheus、alert的持久化的存儲(chǔ)類(總共創(chuàng)建2個(gè),步驟一致,名稱和掛載的nas盤或者子目錄不同而已)。grafana的dashboard持久化通過添加配置文件并打標(biāo)簽實(shí)現(xiàn)。 一、在應(yīng)用市場找到ack-prometheus-operator,點(diǎn)擊進(jìn)入后選擇\\\"一鍵部署\\\"。 ?二、

    2024年01月22日
    瀏覽(17)
  • prometheus 配置服務(wù)器監(jiān)控、服務(wù)監(jiān)控、容器中服務(wù)監(jiān)控與告警

    prometheus 配置服務(wù)器監(jiān)控、服務(wù)監(jiān)控、容器中服務(wù)監(jiān)控與告警

    ? ? ? ?最近公司有幾個(gè)服務(wù)遇到了瓶頸,也就是數(shù)據(jù)量增加了,沒有人發(fā)現(xiàn),這不是缺少一個(gè)監(jiān)控服務(wù)和告警的系統(tǒng)嗎??? ? ? ? 主要需求是監(jiān)控每個(gè)服務(wù),順帶監(jiān)控一下服務(wù)器和一些中間件,這里采集的2種,zabbix和prometheus,由于我們要監(jiān)控的是Docker容器中的服務(wù),最終

    2024年02月14日
    瀏覽(23)
  • zabbix企業(yè)微信告警

    zabbix企業(yè)微信告警

    目前,企業(yè)微信使用要設(shè)置可信域名 華為云搜索云函數(shù) 創(chuàng)建函數(shù) 選擇http函數(shù),隨便輸入函數(shù)名字 回到函數(shù)列表,選擇剛創(chuàng)建的函數(shù),創(chuàng)建觸發(fā)器,安全模式選擇none 點(diǎn)擊右上角管理 選剛創(chuàng)建的api,右邊操作點(diǎn)編輯,下一步,請求path 填 / ,點(diǎn)立即完成,點(diǎn)發(fā)布,再點(diǎn)發(fā)布

    2024年02月09日
    瀏覽(23)
  • Springboot 集成Prometheus 數(shù)據(jù)采集 使用grafana 監(jiān)控報(bào)告告警 郵件配置

    Springboot 集成Prometheus 數(shù)據(jù)采集 使用grafana 監(jiān)控報(bào)告告警 郵件配置

    目錄 Springboot 相關(guān) Pom 重點(diǎn)包 如果有需要可以增加安全包-一般內(nèi)部機(jī)房沒啥事-(非必選) Application.yml配置文件-(非必選) Application.properties management.endpoints.web.exposure.include介紹 啟動(dòng)類 查看監(jiān)控信息 Prometheus Prometheus.yml 配置 如果使用類安全包-(非必選) 啟動(dòng)就可以看到了

    2024年02月11日
    瀏覽(28)
  • prometheus進(jìn)程監(jiān)控配置告警及解決grafana監(jiān)控面板不展示主機(jī)名問題

    prometheus進(jìn)程監(jiān)控配置告警及解決grafana監(jiān)控面板不展示主機(jī)名問題

    process_exporter進(jìn)程監(jiān)控及告警 監(jiān)控服務(wù)器全部或某些進(jìn)程是否健康,以及進(jìn)程所占用資源是否異常使用process_exporter監(jiān)測器進(jìn)行進(jìn)程信息的采集與node_exporter監(jiān)測器相同,需要監(jiān)測哪臺(tái)服務(wù)器的進(jìn)程,就將process_exporter監(jiān)測器部署在哪臺(tái) 安裝process_exporter wget Release v0.7.10 · ncabato

    2024年02月13日
    瀏覽(24)
  • Python實(shí)現(xiàn)企業(yè)微信群告警

    Python實(shí)現(xiàn)企業(yè)微信群告警

    1-1. 什么是企業(yè)微信群機(jī)器人? 企業(yè)微信群機(jī)器人是企業(yè)微信平臺(tái)提供的一種功能,可以通過Webhook方式將消息發(fā)送到指定的企業(yè)微信群中。它可以用于自動(dòng)化發(fā)送通知、告警等信息,實(shí)現(xiàn)監(jiān)控和信息共享。 1-2. 為什么使用企業(yè)微信群機(jī)器人進(jìn)行告警通知? 在企業(yè)中,監(jiān)控和

    2024年02月11日
    瀏覽(18)
  • 企業(yè)微信,阿里釘釘告警群機(jī)器人

    鏈接:如何通過企業(yè)微信群接收報(bào)警通知_云監(jiān)控-阿里云幫助中心

    2024年02月15日
    瀏覽(17)
  • Alertmanager實(shí)現(xiàn)企業(yè)微信機(jī)器人webhook告警

    Alertmanager實(shí)現(xiàn)企業(yè)微信機(jī)器人webhook告警

    由于企業(yè)微信更新問題,現(xiàn)在已經(jīng)無法直接使用創(chuàng)建應(yīng)用后在alertmanager的配置文件中定義企業(yè)id及secret就可以發(fā)送告警信息了,除非填寫備案后域名;為了我們這種個(gè)人開發(fā)者非常的不便,所以本文檔是為了解決想使用企業(yè)微信告警但又無法備案的朋友;下面只是我的操作過

    2024年04月28日
    瀏覽(20)

覺得文章有用就打賞一下文章作者

支付寶掃一掃打賞

博客贊助

微信掃一掃打賞

請作者喝杯咖啡吧~博客贊助

支付寶掃一掃領(lǐng)取紅包,優(yōu)惠每天領(lǐng)

二維碼1

領(lǐng)取紅包

二維碼2

領(lǐng)紅包