国产 无码 综合区,色欲AV无码国产永久播放,无码天堂亚洲国产AV,国产日韩欧美女同一区二区

Prometheus接入AlterManager配置郵件告警(基于K8S環(huán)境部署)

這篇具有很好參考價(jià)值的文章主要介紹了Prometheus接入AlterManager配置郵件告警(基于K8S環(huán)境部署)。希望對(duì)大家有所幫助。如果存在錯(cuò)誤或未考慮完全的地方,請(qǐng)大家不吝賜教,您也可以點(diǎn)擊"舉報(bào)違法"按鈕提交疑問(wèn)。


基于 此環(huán)境做實(shí)驗(yàn)

一.配置Alertmanager告警發(fā)送至郵箱

1.創(chuàng)建AlertManager ConfigMap資源清單

vim alertmanager-cm.yaml
---
kind: ConfigMap
apiVersion: v1
metadata:
  name: alertmanager
  namespace: prometheus
data:
  alertmanager.yml: |-
    global:  
      resolve_timeout: 1m
      smtp_smarthost: 'smtp.qq.com:25'
      smtp_from: '1657310554@qq.com'  # 從這個(gè)郵箱發(fā)送告警
      smtp_auth_username: '1657310554@qq.com'  # 發(fā)送告警郵箱賬號(hào)
      smtp_auth_password: 'rehtuhigsemwbbbe'   # 郵箱驗(yàn)證碼,用自己的郵箱驗(yàn)證碼
      smtp_require_tls: false
    route:   # 路由配置(將郵箱發(fā)送那個(gè)路由)
      group_by: [alertname]
      group_wait: 10s
      group_interval: 10s
      repeat_interval: 10m
      receiver: default-receiver   # 告警發(fā)送到default-receiver接受者
    receivers:
    - name: 'default-receiver'     # 定義default-receiver接受者
      email_configs:
      - to: '1657310554@qq.com'   # 告警發(fā)送郵箱地址
        send_resolved: true

執(zhí)行YAML資源清單:

kubectl apply -f alertmanager-cm.yaml

2.配置文件核心配置說(shuō)明

  • group_by: [alertname]:采用哪個(gè)標(biāo)簽來(lái)作為分組依據(jù)。
  • group_wait:10s:組告警等待時(shí)間。就是告警產(chǎn)生后等待10s,如果有同組告警一起發(fā)出。
  • group_interval: 10s :上下兩組發(fā)送告警的間隔時(shí)間。
  • repeat_interval: 10m:重復(fù)發(fā)送告警的時(shí)間,減少相同郵件的發(fā)送頻率,默認(rèn)是1h。
  • receiver: default-receiver:定義誰(shuí)來(lái)收告警。
  • smtp_smarthost: SMTP服務(wù)器地址+端口。
  • smtp_from:指定從哪個(gè)郵箱發(fā)送報(bào)警。
  • smtp_auth_username:郵箱賬號(hào)。
  • smtp_auth_password: 郵箱密碼(授權(quán)碼)。

二.Prometheus接入AlertManager

1.創(chuàng)建新的Prometheus ConfigMap資源清單,添加監(jiān)控K8S集群告警規(guī)則

vim prometheus-alertmanager-cfg.yaml
---
kind: ConfigMap
apiVersion: v1
metadata:
  labels:
    app: prometheus
  name: prometheus-config
  namespace: prometheus
data:
  prometheus.yml: |
    rule_files: 
    - /etc/prometheus/rules.yml   # 告警規(guī)則位置
    alerting:
      alertmanagers:
      - static_configs:
        - targets: ["localhost:9093"] # 接入AlterManager
    global:
      scrape_interval: 15s
      scrape_timeout: 10s
      evaluation_interval: 1m
    scrape_configs:
    - job_name: 'kubernetes-node'
      kubernetes_sd_configs:
      - role: node
      relabel_configs:
      - source_labels: [__address__]
        regex: '(.*):10250'
        replacement: '${1}:9100'
        target_label: __address__
        action: replace
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
    - job_name: 'kubernetes-node-cadvisor'
      kubernetes_sd_configs:
      - role:  node
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - action: labelmap
        regex: __meta_kubernetes_node_label_(.+)
      - target_label: __address__
        replacement: kubernetes.default.svc:443
      - source_labels: [__meta_kubernetes_node_name]
        regex: (.+)
        target_label: __metrics_path__
        replacement: /api/v1/nodes/${1}/proxy/metrics/cadvisor
    - job_name: 'kubernetes-apiserver'
      kubernetes_sd_configs:
      - role: endpoints
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/serviceaccount/ca.crt
      bearer_token_file: /var/run/secrets/kubernetes.io/serviceaccount/token
      relabel_configs:
      - source_labels: [__meta_kubernetes_namespace, __meta_kubernetes_service_name, __meta_kubernetes_endpoint_port_name]
        action: keep
        regex: default;kubernetes;https
    - job_name: 'kubernetes-service-endpoints'
      kubernetes_sd_configs:
      - role: endpoints
      relabel_configs:
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scrape]
        action: keep
        regex: true
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_scheme]
        action: replace
        target_label: __scheme__
        regex: (https?)
      - source_labels: [__meta_kubernetes_service_annotation_prometheus_io_path]
        action: replace
        target_label: __metrics_path__
        regex: (.+)
      - source_labels: [__address__, __meta_kubernetes_service_annotation_prometheus_io_port]
        action: replace
        target_label: __address__
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
      - action: labelmap
        regex: __meta_kubernetes_service_label_(.+)
      - source_labels: [__meta_kubernetes_namespace]
        action: replace
        target_label: kubernetes_namespace
      - source_labels: [__meta_kubernetes_service_name]
        action: replace
        target_label: kubernetes_name 
    - job_name: 'kubernetes-pods'    # 監(jiān)控Pod配置,添加注解后才可以被發(fā)現(xiàn)
      kubernetes_sd_configs:
      - role: pod
      relabel_configs:
      - action: keep
        regex: true
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_scrape
      - action: replace
        regex: (.+)
        source_labels:
        - __meta_kubernetes_pod_annotation_prometheus_io_path
        target_label: __metrics_path__
      - action: replace
        regex: ([^:]+)(?::\d+)?;(\d+)
        replacement: $1:$2
        source_labels:
        - __address__
        - __meta_kubernetes_pod_annotation_prometheus_io_port
        target_label: __address__
      - action: labelmap
        regex: __meta_kubernetes_pod_label_(.+)
      - action: replace
        source_labels:
        - __meta_kubernetes_namespace
        target_label: kubernetes_namespace
      - action: replace
        source_labels:
        - __meta_kubernetes_pod_name
        target_label: kubernetes_pod_name
    - job_name: 'kubernetes-etcd'   # 監(jiān)控etcd配置
      scheme: https
      tls_config:
        ca_file: /var/run/secrets/kubernetes.io/k8s-certs/etcd/ca.crt
        cert_file: /var/run/secrets/kubernetes.io/k8s-certs/etcd/server.crt
        key_file: /var/run/secrets/kubernetes.io/k8s-certs/etcd/server.key
      scrape_interval: 5s
      static_configs:
      - targets: ['192.168.40.180:2379'] # ip為master1的ip
  rules.yml: |  # K8S集群告警規(guī)則配置文件
    groups:
    - name: example
      rules:
      - alert: apiserver的cpu使用率大于80%
        expr: rate(process_cpu_seconds_total{job=~"kubernetes-apiserver"}[1m]) * 100 > 80
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}組件的cpu使用率超過(guò)80%"
      - alert:  apiserver的cpu使用率大于90%
        expr: rate(process_cpu_seconds_total{job=~"kubernetes-apiserver"}[1m]) * 100 > 90
        for: 2s
        labels:
          severity: critical
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}組件的cpu使用率超過(guò)90%"
      - alert: etcd的cpu使用率大于80%
        expr: rate(process_cpu_seconds_total{job=~"kubernetes-etcd"}[1m]) * 100 > 80
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}組件的cpu使用率超過(guò)80%"
      - alert:  etcd的cpu使用率大于90%
        expr: rate(process_cpu_seconds_total{job=~"kubernetes-etcd"}[1m]) * 100 > 90
        for: 2s
        labels:
          severity: critical
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}組件的cpu使用率超過(guò)90%"
      - alert: kube-state-metrics的cpu使用率大于80%
        expr: rate(process_cpu_seconds_total{k8s_app=~"kube-state-metrics"}[1m]) * 100 > 80
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "{{$labels.instance}}的{{$labels.k8s_app}}組件的cpu使用率超過(guò)80%"
          value: "{{ $value }}%"
          threshold: "80%"      
      - alert: kube-state-metrics的cpu使用率大于90%
        expr: rate(process_cpu_seconds_total{k8s_app=~"kube-state-metrics"}[1m]) * 100 > 0
        for: 2s
        labels:
          severity: critical
        annotations:
          description: "{{$labels.instance}}的{{$labels.k8s_app}}組件的cpu使用率超過(guò)90%"
          value: "{{ $value }}%"
          threshold: "90%"      
      - alert: coredns的cpu使用率大于80%
        expr: rate(process_cpu_seconds_total{k8s_app=~"kube-dns"}[1m]) * 100 > 80
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "{{$labels.instance}}的{{$labels.k8s_app}}組件的cpu使用率超過(guò)80%"
          value: "{{ $value }}%"
          threshold: "80%"      
      - alert: coredns的cpu使用率大于90%
        expr: rate(process_cpu_seconds_total{k8s_app=~"kube-dns"}[1m]) * 100 > 90
        for: 2s
        labels:
          severity: critical
        annotations:
          description: "{{$labels.instance}}的{{$labels.k8s_app}}組件的cpu使用率超過(guò)90%"
          value: "{{ $value }}%"
          threshold: "90%"      
      - alert: kube-proxy打開(kāi)句柄數(shù)>600
        expr: process_open_fds{job=~"kubernetes-kube-proxy"}  > 600
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}打開(kāi)句柄數(shù)>600"
          value: "{{ $value }}"
      - alert: kube-proxy打開(kāi)句柄數(shù)>1000
        expr: process_open_fds{job=~"kubernetes-kube-proxy"}  > 1000
        for: 2s
        labels:
          severity: critical
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}打開(kāi)句柄數(shù)>1000"
          value: "{{ $value }}"
      - alert: kubernetes-schedule打開(kāi)句柄數(shù)>600
        expr: process_open_fds{job=~"kubernetes-schedule"}  > 600
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}打開(kāi)句柄數(shù)>600"
          value: "{{ $value }}"
      - alert: kubernetes-schedule打開(kāi)句柄數(shù)>1000
        expr: process_open_fds{job=~"kubernetes-schedule"}  > 1000
        for: 2s
        labels:
          severity: critical
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}打開(kāi)句柄數(shù)>1000"
          value: "{{ $value }}"
      - alert: kubernetes-controller-manager打開(kāi)句柄數(shù)>600
        expr: process_open_fds{job=~"kubernetes-controller-manager"}  > 600
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}打開(kāi)句柄數(shù)>600"
          value: "{{ $value }}"
      - alert: kubernetes-controller-manager打開(kāi)句柄數(shù)>1000
        expr: process_open_fds{job=~"kubernetes-controller-manager"}  > 1000
        for: 2s
        labels:
          severity: critical
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}打開(kāi)句柄數(shù)>1000"
          value: "{{ $value }}"
      - alert: kubernetes-apiserver打開(kāi)句柄數(shù)>600
        expr: process_open_fds{job=~"kubernetes-apiserver"}  > 600
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}打開(kāi)句柄數(shù)>600"
          value: "{{ $value }}"
      - alert: kubernetes-apiserver打開(kāi)句柄數(shù)>1000
        expr: process_open_fds{job=~"kubernetes-apiserver"}  > 1000
        for: 2s
        labels:
          severity: critical
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}打開(kāi)句柄數(shù)>1000"
          value: "{{ $value }}"
      - alert: kubernetes-etcd打開(kāi)句柄數(shù)>600
        expr: process_open_fds{job=~"kubernetes-etcd"}  > 600
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}打開(kāi)句柄數(shù)>600"
          value: "{{ $value }}"
      - alert: kubernetes-etcd打開(kāi)句柄數(shù)>1000
        expr: process_open_fds{job=~"kubernetes-etcd"}  > 1000
        for: 2s
        labels:
          severity: critical
        annotations:
          description: "{{$labels.instance}}的{{$labels.job}}打開(kāi)句柄數(shù)>1000"
          value: "{{ $value }}"
      - alert: coredns
        expr: process_open_fds{k8s_app=~"kube-dns"}  > 600
        for: 2s
        labels:
          severity: warnning 
        annotations:
          description: "插件{{$labels.k8s_app}}({{$labels.instance}}): 打開(kāi)句柄數(shù)超過(guò)600"
          value: "{{ $value }}"
      - alert: coredns
        expr: process_open_fds{k8s_app=~"kube-dns"}  > 1000
        for: 2s
        labels:
          severity: critical
        annotations:
          description: "插件{{$labels.k8s_app}}({{$labels.instance}}): 打開(kāi)句柄數(shù)超過(guò)1000"
          value: "{{ $value }}"
      - alert: kube-proxy
        expr: process_virtual_memory_bytes{job=~"kubernetes-kube-proxy"}  > 2000000000
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "組件{{$labels.job}}({{$labels.instance}}): 使用虛擬內(nèi)存超過(guò)2G"
          value: "{{ $value }}"
      - alert: scheduler
        expr: process_virtual_memory_bytes{job=~"kubernetes-schedule"}  > 2000000000
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "組件{{$labels.job}}({{$labels.instance}}): 使用虛擬內(nèi)存超過(guò)2G"
          value: "{{ $value }}"
      - alert: kubernetes-controller-manager
        expr: process_virtual_memory_bytes{job=~"kubernetes-controller-manager"}  > 2000000000
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "組件{{$labels.job}}({{$labels.instance}}): 使用虛擬內(nèi)存超過(guò)2G"
          value: "{{ $value }}"
      - alert: kubernetes-apiserver
        expr: process_virtual_memory_bytes{job=~"kubernetes-apiserver"}  > 2000000000
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "組件{{$labels.job}}({{$labels.instance}}): 使用虛擬內(nèi)存超過(guò)2G"
          value: "{{ $value }}"
      - alert: kubernetes-etcd
        expr: process_virtual_memory_bytes{job=~"kubernetes-etcd"}  > 2000000000
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "組件{{$labels.job}}({{$labels.instance}}): 使用虛擬內(nèi)存超過(guò)2G"
          value: "{{ $value }}"
      - alert: kube-dns
        expr: process_virtual_memory_bytes{k8s_app=~"kube-dns"}  > 2000000000
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "插件{{$labels.k8s_app}}({{$labels.instance}}): 使用虛擬內(nèi)存超過(guò)2G"
          value: "{{ $value }}"
      - alert: HttpRequestsAvg
        expr: sum(rate(rest_client_requests_total{job=~"kubernetes-kube-proxy|kubernetes-kubelet|kubernetes-schedule|kubernetes-control-manager|kubernetes-apiservers"}[1m]))  > 1000
        for: 2s
        labels:
          team: admin
        annotations:
          description: "組件{{$labels.job}}({{$labels.instance}}): TPS超過(guò)1000"
          value: "{{ $value }}"
          threshold: "1000"   
      - alert: Pod_restarts
        expr: kube_pod_container_status_restarts_total{namespace=~"kube-system|default|monitor-sa"} > 0
        for: 2s
        labels:
          severity: warnning
        annotations:
          description: "在{{$labels.namespace}}名稱(chēng)空間下發(fā)現(xiàn){{$labels.pod}}這個(gè)pod下的容器{{$labels.container}}被重啟,這個(gè)監(jiān)控指標(biāo)是由{{$labels.instance}}采集的"
          value: "{{ $value }}"
          threshold: "0"
      - alert: Pod_waiting
        expr: kube_pod_container_status_waiting_reason{namespace=~"kube-system|default"} == 1
        for: 2s
        labels:
          team: admin
        annotations:
          description: "空間{{$labels.namespace}}({{$labels.instance}}): 發(fā)現(xiàn){{$labels.pod}}下的{{$labels.container}}啟動(dòng)異常等待中"
          value: "{{ $value }}"
          threshold: "1"   
      - alert: Pod_terminated
        expr: kube_pod_container_status_terminated_reason{namespace=~"kube-system|default|monitor-sa"} == 1
        for: 2s
        labels:
          team: admin
        annotations:
          description: "空間{{$labels.namespace}}({{$labels.instance}}): 發(fā)現(xiàn){{$labels.pod}}下的{{$labels.container}}被刪除"
          value: "{{ $value }}"
          threshold: "1"
      - alert: Etcd_leader
        expr: etcd_server_has_leader{job="kubernetes-etcd"} == 0
        for: 2s
        labels:
          team: admin
        annotations:
          description: "組件{{$labels.job}}({{$labels.instance}}): 當(dāng)前沒(méi)有l(wèi)eader"
          value: "{{ $value }}"
          threshold: "0"
      - alert: Etcd_leader_changes
        expr: rate(etcd_server_leader_changes_seen_total{job="kubernetes-etcd"}[1m]) > 0
        for: 2s
        labels:
          team: admin
        annotations:
          description: "組件{{$labels.job}}({{$labels.instance}}): 當(dāng)前l(fā)eader已發(fā)生改變"
          value: "{{ $value }}"
          threshold: "0"
      - alert: Etcd_failed
        expr: rate(etcd_server_proposals_failed_total{job="kubernetes-etcd"}[1m]) > 0
        for: 2s
        labels:
          team: admin
        annotations:
          description: "組件{{$labels.job}}({{$labels.instance}}): 服務(wù)失敗"
          value: "{{ $value }}"
          threshold: "0"
      - alert: Etcd_db_total_size
        expr: etcd_debugging_mvcc_db_total_size_in_bytes{job="kubernetes-etcd"} > 10000000000
        for: 2s
        labels:
          team: admin
        annotations:
          description: "組件{{$labels.job}}({{$labels.instance}}):db空間超過(guò)10G"
          value: "{{ $value }}"
          threshold: "10G"
      - alert: Endpoint_ready
        expr: kube_endpoint_address_not_ready{namespace=~"kube-system|default"} == 1
        for: 2s
        labels:
          team: admin
        annotations:
          description: "空間{{$labels.namespace}}({{$labels.instance}}): 發(fā)現(xiàn){{$labels.endpoint}}不可用"
          value: "{{ $value }}"
          threshold: "1"
    - name: 物理節(jié)點(diǎn)狀態(tài)-監(jiān)控告警
      rules:
      - alert: 物理節(jié)點(diǎn)cpu使用率
        expr: 100-avg(irate(node_cpu_seconds_total{mode="idle"}[5m])) by(instance)*100 > 90
        for: 2s
        labels:
          severity: ccritical
        annotations:
          summary: "{{ $labels.instance }}cpu使用率過(guò)高"
          description: "{{ $labels.instance }}的cpu使用率超過(guò)90%,當(dāng)前使用率[{{ $value }}],需要排查處理" 
      - alert: 物理節(jié)點(diǎn)內(nèi)存使用率
        expr: (node_memory_MemTotal_bytes - (node_memory_MemFree_bytes + node_memory_Buffers_bytes + node_memory_Cached_bytes)) / node_memory_MemTotal_bytes * 100 > 90
        for: 2s
        labels:
          severity: critical
        annotations:
          summary: "{{ $labels.instance }}內(nèi)存使用率過(guò)高"
          description: "{{ $labels.instance }}的內(nèi)存使用率超過(guò)90%,當(dāng)前使用率[{{ $value }}],需要排查處理"
      - alert: InstanceDown
        expr: up == 0
        for: 2s
        labels:
          severity: critical
        annotations:   
          summary: "{{ $labels.instance }}: 服務(wù)器宕機(jī)"
          description: "{{ $labels.instance }}: 服務(wù)器延時(shí)超過(guò)2分鐘"
      - alert: 物理節(jié)點(diǎn)磁盤(pán)的IO性能
        expr: 100-(avg(irate(node_disk_io_time_seconds_total[1m])) by(instance)* 100) < 60
        for: 2s
        labels:
          severity: critical
        annotations:
          summary: "{{$labels.mountpoint}} 流入磁盤(pán)IO使用率過(guò)高!"
          description: "{{$labels.mountpoint }} 流入磁盤(pán)IO大于60%(目前使用:{{$value}})"
      - alert: 入網(wǎng)流量帶寬
        expr: ((sum(rate (node_network_receive_bytes_total{device!~'tap.*|veth.*|br.*|docker.*|virbr*|lo*'}[5m])) by (instance)) / 100) > 102400
        for: 2s
        labels:
          severity: critical
        annotations:
          summary: "{{$labels.mountpoint}} 流入網(wǎng)絡(luò)帶寬過(guò)高!"
          description: "{{$labels.mountpoint }}流入網(wǎng)絡(luò)帶寬持續(xù)5分鐘高于100M. RX帶寬使用率{{$value}}"
      - alert: 出網(wǎng)流量帶寬
        expr: ((sum(rate (node_network_transmit_bytes_total{device!~'tap.*|veth.*|br.*|docker.*|virbr*|lo*'}[5m])) by (instance)) / 100) > 102400
        for: 2s
        labels:
          severity: critical
        annotations:
          summary: "{{$labels.mountpoint}} 流出網(wǎng)絡(luò)帶寬過(guò)高!"
          description: "{{$labels.mountpoint }}流出網(wǎng)絡(luò)帶寬持續(xù)5分鐘高于100M. RX帶寬使用率{{$value}}"
      - alert: TCP會(huì)話(huà)
        expr: node_netstat_Tcp_CurrEstab > 1000
        for: 2s
        labels:
          severity: critical
        annotations:
          summary: "{{$labels.mountpoint}} TCP_ESTABLISHED過(guò)高!"
          description: "{{$labels.mountpoint }} TCP_ESTABLISHED大于1000%(目前使用:{{$value}}%)"
      - alert: 磁盤(pán)容量
        expr: 100-(node_filesystem_free_bytes{fstype=~"ext4|xfs"}/node_filesystem_size_bytes {fstype=~"ext4|xfs"}*100) > 80
        for: 2s
        labels:
          severity: critical
        annotations:
          summary: "{{$labels.mountpoint}} 磁盤(pán)分區(qū)使用率過(guò)高!"
          description: "{{$labels.mountpoint }} 磁盤(pán)分區(qū)使用大于80%(目前使用:{{$value}}%)"

執(zhí)行資源清單:

kubectl apply -f prometheus-alertmanager-cfg.yaml

2.由于在prometheus中新增了etcd,所以生成一個(gè)etcd-certs,這個(gè)在部署prometheus需要

kubectl -n prometheus create secret generic etcd-certs --from-file=/etc/kubernetes/pki/etcd/

三.部署Prometheus+AlterManager(放到一個(gè)Pod中)

1.在node1節(jié)點(diǎn)創(chuàng)建/data/alertmanager目錄,存放alertmanager數(shù)據(jù)

mkdir /data/alertmanager -p
chmod -R 777 /data/alertmanager

2.刪除舊的prometheus deployment資源

kubectl delete deploy prometheus-server -n prometheus

3.創(chuàng)建deployment資源

vim prometheus-alertmanager-deploy.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: prometheus-server
  namespace: prometheus
  labels:
    app: prometheus
spec:
  replicas: 1
  selector:
    matchLabels:
      app: prometheus
      component: server
  template:
    metadata:
      labels:
        app: prometheus
        component: server
      annotations:
        prometheus.io/scrape: 'false'
    spec:
      nodeName: node1 # 調(diào)度到node1節(jié)點(diǎn)
      serviceAccountName: prometheus # 指定sa服務(wù)賬號(hào)
      containers:
      - name: prometheus
        image: prom/prometheus:v2.33.5
        imagePullPolicy: IfNotPresent
        command:
        - "/bin/prometheus"
        args:
        - "--config.file=/etc/prometheus/prometheus.yml"
        - "--storage.tsdb.path=/prometheus"
        - "--storage.tsdb.retention=24h"
        - "--web.enable-lifecycle"
        ports:
        - containerPort: 9090
          protocol: TCP
        volumeMounts:
        - mountPath: /etc/prometheus
          name: prometheus-config
        - mountPath: /prometheus/
          name: prometheus-storage-volume
        - name: k8s-certs
          mountPath: /var/run/secrets/kubernetes.io/k8s-certs/etcd/
      - name: alertmanager
        image: prom/alertmanager:v0.23.0
        imagePullPolicy: IfNotPresent
        args:
        - "--config.file=/etc/alertmanager/alertmanager.yml"
        - "--log.level=debug"
        ports:
        - containerPort: 9093
          protocol: TCP
          name: alertmanager
        volumeMounts:
        - name: alertmanager-config
          mountPath: /etc/alertmanager
        - name: alertmanager-storage
          mountPath: /alertmanager
        - name: localtime
          mountPath: /etc/localtime
      volumes:
        - name: prometheus-config
          configMap:
            name: prometheus-config
        - name: prometheus-storage-volume
          hostPath:
           path: /data
           type: Directory
        - name: k8s-certs
          secret:
           secretName: etcd-certs
        - name: alertmanager-config
          configMap:
            name: alertmanager
        - name: alertmanager-storage
          hostPath:
           path: /data/alertmanager
           type: DirectoryOrCreate
        - name: localtime
          hostPath:
           path: /usr/share/zoneinfo/Asia/Shanghai

執(zhí)行YAML資源清單:

kubectl apply -f prometheus-alertmanager-deploy.yaml

查看狀態(tài):

kubectl get pods -n prometheus

Prometheus接入AlterManager配置郵件告警(基于K8S環(huán)境部署),Prometheus監(jiān)控,kubernetes,prometheus,容器
4.創(chuàng)建AlertManager SVC資源

vim alertmanager-svc.yaml 
---
apiVersion: v1
kind: Service
metadata:
  labels:
    name: prometheus
    kubernetes.io/cluster-service: 'true'
  name: alertmanager
  namespace: prometheus
spec:
  ports:
  - name: alertmanager
    nodePort: 30066
    port: 9093
    protocol: TCP
    targetPort: 9093
  selector:
    app: prometheus
  sessionAffinity: None
  type: NodePort

執(zhí)行YAML資源清單:

kubectl apply -f alertmanager-svc.yaml 

查看狀態(tài):

kubectl get svc -n prometheus

Prometheus接入AlterManager配置郵件告警(基于K8S環(huán)境部署),Prometheus監(jiān)控,kubernetes,prometheus,容器

四. 測(cè)試告警

瀏覽器訪(fǎng)問(wèn):http://IP:30066
Prometheus接入AlterManager配置郵件告警(基于K8S環(huán)境部署),Prometheus監(jiān)控,kubernetes,prometheus,容器

如上圖可以看到,Prometheus的告警信息已經(jīng)發(fā)到AlterManager了,AlertManager收到報(bào)警數(shù)據(jù)后,會(huì)將警報(bào)信息進(jìn)行分組,然后根據(jù)AlertManager配置的 group_wait 時(shí)間先進(jìn)行等待。等wait時(shí)間過(guò)后再發(fā)送報(bào)警信息至郵件!
Prometheus接入AlterManager配置郵件告警(基于K8S環(huán)境部署),Prometheus監(jiān)控,kubernetes,prometheus,容器文章來(lái)源地址http://www.zghlxwxcb.cn/news/detail-854957.html

到了這里,關(guān)于Prometheus接入AlterManager配置郵件告警(基于K8S環(huán)境部署)的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!

本文來(lái)自互聯(lián)網(wǎng)用戶(hù)投稿,該文觀(guān)點(diǎn)僅代表作者本人,不代表本站立場(chǎng)。本站僅提供信息存儲(chǔ)空間服務(wù),不擁有所有權(quán),不承擔(dān)相關(guān)法律責(zé)任。如若轉(zhuǎn)載,請(qǐng)注明出處: 如若內(nèi)容造成侵權(quán)/違法違規(guī)/事實(shí)不符,請(qǐng)點(diǎn)擊違法舉報(bào)進(jìn)行投訴反饋,一經(jīng)查實(shí),立即刪除!

領(lǐng)支付寶紅包贊助服務(wù)器費(fèi)用

相關(guān)文章

  • prometheus監(jiān)控k8s服務(wù)并告警到釘釘

    prometheus監(jiān)控k8s服務(wù)并告警到釘釘

    一、監(jiān)控k8s集群 要監(jiān)控k8s集群需要使用到以下服務(wù)用于收集監(jiān)控的資源信息,node_exporter用于監(jiān)控k8s集群節(jié)點(diǎn)的資源信息,kube-state-metrics用于監(jiān)控k8s集群的deployment、statefulset、daemonset、pod等的狀態(tài),cadvisor用于監(jiān)控k8s集群的pod資源信息 在k8s集群中創(chuàng)建monitoring的命名空間用于部

    2024年02月13日
    瀏覽(105)
  • Grafana配置郵件告警

    Grafana配置郵件告警

    1、創(chuàng)建一個(gè)監(jiān)控圖 2、grafana郵件配置 3、配置告警通道 測(cè)試郵件發(fā)送 這里配置告警發(fā)送源 4、配置告警

    2024年02月09日
    瀏覽(17)
  • Prometheus 告警規(guī)則配置

    Prometheus 告警規(guī)則配置

    alert.rule即告警規(guī)則,在Prometheus中,通過(guò)用戶(hù)自定義的條件進(jìn)行告警,自定義條件可以由 PromQL 表達(dá)式定義,當(dāng)滿(mǎn)足告警條件后,Prometheus會(huì)通過(guò)web界面進(jìn)行告警,如果同時(shí)有部署Alertmanager,則可利用Alertmanager進(jìn)行更為復(fù)雜的通知,如釘釘、微信、飛書(shū)等個(gè)性化渠道進(jìn)行通知。

    2023年04月25日
    瀏覽(28)
  • Prometheus基于k8s的自動(dòng)發(fā)現(xiàn)配置監(jiān)控

    Prometheus基于k8s的自動(dòng)發(fā)現(xiàn)配置監(jiān)控

    k8s配置Prometheus監(jiān)控時(shí),可以通過(guò)servicemonitor的方式增加job,以此來(lái)增加監(jiān)控項(xiàng),但這種方式進(jìn)行監(jiān)控配置,只能手工一個(gè)一個(gè)的增加,如果k8s集群規(guī)模較大的情況下,這種方式會(huì)很麻煩。 一種方式是采用consul注冊(cè)中心的方式進(jìn)行自動(dòng)發(fā)現(xiàn)。 另外一種方式是基于kubernetes_sd_co

    2024年02月05日
    瀏覽(107)
  • 第八篇: K8S Prometheus Operator實(shí)現(xiàn)Ceph集群企業(yè)微信機(jī)器人告警

    第八篇: K8S Prometheus Operator實(shí)現(xiàn)Ceph集群企業(yè)微信機(jī)器人告警

    我們的k8s集群與ceph集群是部署在不同的服務(wù)器上,因此實(shí)現(xiàn)方案如下: (1) ceph集群開(kāi)啟mgr內(nèi)置的exporter服務(wù),用于獲取ceph集群的metrics (2) k8s集群通過(guò) Service + Endponit + ServiceMonitor建立ceph集群metrics與Prometheus之間的聯(lián)系 建立一個(gè) ServiceMonitor 對(duì)象,用于 Prometheus 添加監(jiān)控項(xiàng); 為

    2024年02月14日
    瀏覽(21)
  • 云原生監(jiān)控系統(tǒng)Prometheus:基于Prometheus構(gòu)建智能化監(jiān)控告警系統(tǒng)

    云原生監(jiān)控系統(tǒng)Prometheus:基于Prometheus構(gòu)建智能化監(jiān)控告警系統(tǒng)

    目錄 一、理論 1.Promethues簡(jiǎn)介 2.監(jiān)控告警系統(tǒng)設(shè)計(jì)思路 3.Prometheus監(jiān)控體系 4.Prometheus時(shí)間序列數(shù)據(jù) 5.Prometheus的生態(tài)組件 6.Prometheus工作原理 7.Prometheus監(jiān)控內(nèi)容 8.部署Prometheus 9.部署Exporters 10.部署Grafana進(jìn)行展示 二、實(shí)驗(yàn) 1.部署Prometheus 2.部署Exporters 2.監(jiān)控遠(yuǎn)程MySQL 3.部署Grafana進(jìn)行

    2024年02月07日
    瀏覽(25)
  • 阿里云ACK托管版安裝Prometheus并配置kafka告警

    阿里云ACK托管版安裝Prometheus并配置kafka告警

    前提條件: 1.已有ACK集群。 一、ACK集群中創(chuàng)建prometheus、alert的持久化的存儲(chǔ)類(lèi)(總共創(chuàng)建2個(gè),步驟一致,名稱(chēng)和掛載的nas盤(pán)或者子目錄不同而已)。grafana的dashboard持久化通過(guò)添加配置文件并打標(biāo)簽實(shí)現(xiàn)。 一、在應(yīng)用市場(chǎng)找到ack-prometheus-operator,點(diǎn)擊進(jìn)入后選擇\\\"一鍵部署\\\"。 ?二、

    2024年01月22日
    瀏覽(17)
  • prometheus 配置服務(wù)器監(jiān)控、服務(wù)監(jiān)控、容器中服務(wù)監(jiān)控與告警

    prometheus 配置服務(wù)器監(jiān)控、服務(wù)監(jiān)控、容器中服務(wù)監(jiān)控與告警

    ? ? ? ?最近公司有幾個(gè)服務(wù)遇到了瓶頸,也就是數(shù)據(jù)量增加了,沒(méi)有人發(fā)現(xiàn),這不是缺少一個(gè)監(jiān)控服務(wù)和告警的系統(tǒng)嗎??? ? ? ? 主要需求是監(jiān)控每個(gè)服務(wù),順帶監(jiān)控一下服務(wù)器和一些中間件,這里采集的2種,zabbix和prometheus,由于我們要監(jiān)控的是Docker容器中的服務(wù),最終

    2024年02月14日
    瀏覽(23)
  • prometheus進(jìn)程監(jiān)控配置告警及解決grafana監(jiān)控面板不展示主機(jī)名問(wèn)題

    prometheus進(jìn)程監(jiān)控配置告警及解決grafana監(jiān)控面板不展示主機(jī)名問(wèn)題

    process_exporter進(jìn)程監(jiān)控及告警 監(jiān)控服務(wù)器全部或某些進(jìn)程是否健康,以及進(jìn)程所占用資源是否異常使用process_exporter監(jiān)測(cè)器進(jìn)行進(jìn)程信息的采集與node_exporter監(jiān)測(cè)器相同,需要監(jiān)測(cè)哪臺(tái)服務(wù)器的進(jìn)程,就將process_exporter監(jiān)測(cè)器部署在哪臺(tái) 安裝process_exporter wget Release v0.7.10 · ncabato

    2024年02月13日
    瀏覽(24)
  • K8s環(huán)境下監(jiān)控告警平臺(tái)搭建及配置

    K8s環(huán)境下監(jiān)控告警平臺(tái)搭建及配置

    Promethues是可以單機(jī)搭建的,參考 prometheus入門(mén) [1] 本文是就Promethues+Grafana在K8s環(huán)境下的搭建及配置 啟動(dòng)minikube minikube start 安裝helm 使用Helm Chart 安裝 Prometheus Operator: helm install prometheus-operator stable/prometheus-operator -n monitoring 報(bào)錯(cuò): 根據(jù) Kubernetes和微服務(wù)監(jiān)控體系 [2] (搜索\\\"降低我們

    2024年02月13日
    瀏覽(38)

覺(jué)得文章有用就打賞一下文章作者

支付寶掃一掃打賞

博客贊助

微信掃一掃打賞

請(qǐng)作者喝杯咖啡吧~博客贊助

支付寶掃一掃領(lǐng)取紅包,優(yōu)惠每天領(lǐng)

二維碼1

領(lǐng)取紅包

二維碼2

領(lǐng)紅包