前言:
最近在部署prometheus的過程中遇到的這個問題,感覺比較的經(jīng)典,有必要記錄一下。
現(xiàn)象是部署prometheus主服務(wù)的時候,看不到pod,只能看到deployment,由于慌亂,一度以為是集群有毛病了,然后重新做了集群,具體情況如下圖:
注:up-to-date表示沒有部署,available表示無可用pod
[root@node4 yaml]# k get deployments.apps -n monitor-sa
NAME READY UP-TO-DATE AVAILABLE AGE
prometheus-server 0/2 0 0 2m5s
[root@node4 yaml]# k get po -n monitor-sa
NAME READY STATUS RESTARTS AGE
node-exporter-6ttbl 1/1 Running 0 23h
node-exporter-7ls5t 1/1 Running 0 23h
node-exporter-r287q 1/1 Running 0 23h
node-exporter-z85dm 1/1 Running 0 23h
部署文件如下;
注意注意,有一個sa的引用哦??serviceAccountName: monitor
[root@node4 yaml]# cat prometheus-deploy.yaml
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: prometheus-server
namespace: monitor-sa
labels:
app: prometheus
spec:
replicas: 2
selector:
matchLabels:
app: prometheus
component: server
#matchExpressions:
#- {key: app, operator: In, values: [prometheus]}
#- {key: component, operator: In, values: [server]}
template:
metadata:
labels:
app: prometheus
component: server
annotations:
prometheus.io/scrape: 'false'
spec:
nodeName: node4
serviceAccountName: monitor
containers:
- name: prometheus
image: prom/prometheus:v2.2.1
imagePullPolicy: IfNotPresent
command:
- prometheus
- --config.file=/etc/prometheus/prometheus.yml
- --storage.tsdb.path=/prometheus
- --storage.tsdb.retention=720h
ports:
- containerPort: 9090
protocol: TCP
volumeMounts:
- mountPath: /etc/prometheus/prometheus.yml
name: prometheus-config
subPath: prometheus.yml
- mountPath: /prometheus/
name: prometheus-storage-volume
volumes:
- name: prometheus-config
configMap:
name: prometheus-config
items:
- key: prometheus.yml
path: prometheus.yml
mode: 0644
- name: prometheus-storage-volume
hostPath:
path: /data
type: Directory
?
解決方案:
那么,遇到這種情況,我們應(yīng)該怎么做呢?當然了,第一點就是不要慌,其次deployment控制器有一個比較不讓人注意的地方,就是編輯deployment可以看到該deployment的當前狀態(tài)詳情,會有非常詳細的信息給我們看,也就是status字段
具體的命令是?kubectl?edit?deployment -n?命名空間? deployment名稱,在本例中是這樣的:
。。。。。。略略略
path: prometheus.yml
name: prometheus-config
name: prometheus-config
- hostPath:
path: /data
type: Directory
name: prometheus-storage-volume
status:
conditions:
- lastTransitionTime: "2023-11-22T15:21:06Z"
lastUpdateTime: "2023-11-22T15:21:06Z"
message: Deployment does not have minimum availability.
reason: MinimumReplicasUnavailable
status: "False"
type: Available
- lastTransitionTime: "2023-11-22T15:21:06Z"
lastUpdateTime: "2023-11-22T15:21:06Z"
message: 'pods "prometheus-server-78bbb77dd7-" is forbidden: error looking up
service account monitor-sa/monitor: serviceaccount "monitor" not found'
reason: FailedCreate
status: "True"
type: ReplicaFailure
- lastTransitionTime: "2023-11-22T15:31:07Z"
lastUpdateTime: "2023-11-22T15:31:07Z"
message: ReplicaSet "prometheus-server-78bbb77dd7" has timed out progressing.
reason: ProgressDeadlineExceeded
status: "False"
type: Progressing
observedGeneration: 1
unavailableReplicas: 2
可以看到有三個message,第一個是標題里提到的報錯信息,在dashboard里這個信息會優(yōu)先顯示,如果是報錯的時候,第二個message是進一步解釋錯誤問題在哪,本例里是說有個名叫?monitor的sa沒有找到,第三個信息說的是這個deployment控制的rs部署失敗,此信息無關(guān)緊要了,那么,重要的是第二個信息,這個信息是解決問題的關(guān)鍵。
附:一個正常的deployment?的status:
這個status告訴我們,他是一個副本,部署成功的,因此,第一個message是Deployment has minimum availability
serviceAccount: kube-state-metrics
serviceAccountName: kube-state-metrics
terminationGracePeriodSeconds: 30
status:
availableReplicas: 1
conditions:
- lastTransitionTime: "2023-11-21T14:56:14Z"
lastUpdateTime: "2023-11-21T14:56:14Z"
message: Deployment has minimum availability.
reason: MinimumReplicasAvailable
status: "True"
type: Available
- lastTransitionTime: "2023-11-21T14:56:13Z"
lastUpdateTime: "2023-11-21T14:56:14Z"
message: ReplicaSet "kube-state-metrics-57794dcf65" has successfully progressed.
reason: NewReplicaSetAvailable
status: "True"
type: Progressing
observedGeneration: 1
readyReplicas: 1
replicas: 1
updatedReplicas: 1
具體的解決方案:
根據(jù)以上報錯信息,那么,我們就需要一個sa,當然了,如果不想給太高的權(quán)限,就需要自己編寫權(quán)限文件了,這里我偷懶?使用cluster-admin,具體的命令如下:
[root@node4 yaml]# k create sa -n monitor-sa monitor
serviceaccount/monitor created
[root@node4 yaml]# k create clusterrolebinding monitor-clusterrolebinding -n monitor-sa --clusterrole=cluster-admin --serviceaccount=monitor-sa:monitor
再次部署就成功了:
[root@node4 yaml]# k get po -n monitor-sa -owide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
node-exporter-6ttbl 1/1 Running 0 24h 192.168.123.12 node2 <none> <none>
node-exporter-7ls5t 1/1 Running 0 24h 192.168.123.11 node1 <none> <none>
node-exporter-r287q 1/1 Running 1 (2m57s ago) 24h 192.168.123.14 node4 <none> <none>
node-exporter-z85dm 1/1 Running 0 24h 192.168.123.13 node3 <none> <none>
prometheus-server-78bbb77dd7-6smlt 1/1 Running 0 20s 10.244.41.19 node4 <none> <none>
prometheus-server-78bbb77dd7-fhf5k 1/1 Running 0 20s 10.244.41.18 node4 <none> <none>
總結(jié)來了:
那么,其實缺少sa可能會導(dǎo)致pod被隱藏,可以得出,sa是這個deployment的必要非顯性依賴,同樣的,如果部署文件內(nèi)有寫configmap,但configmap并沒有提前創(chuàng)建也會出現(xiàn)這種錯誤,就是創(chuàng)建了deployment,但pod創(chuàng)建不出來,不像namespace沒有提前創(chuàng)建的情況,namespace是必要顯性依賴,沒有會直接不讓創(chuàng)建。
配額設(shè)置也是和sa一樣的必要非顯性依賴。
例如,下面創(chuàng)建一個針對default這個命名空間的配額文件,此文件定義如下:
定義的內(nèi)容為規(guī)定default命名空間下最多4個pods,最多20個services,只能使用10G的內(nèi)存,5.5的CPU
[root@node4 yaml]# cat quota-nginx.yaml
apiVersion: v1
kind: ResourceQuota
metadata:
name: quota
namespace: default
spec:
hard:
requests.cpu: "5.5"
limits.cpu: "5.5"
requests.memory: 10Gi
limits.memory: 10Gi
pods: "4"
services: "20"
下面創(chuàng)建一個deployment,副本是6個的nginx:
[root@node4 yaml]# cat nginx.yaml
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "1"
creationTimestamp: "2023-11-22T16:13:33Z"
generation: 1
labels:
app: nginx
name: nginx
namespace: default
resourceVersion: "16411"
uid: e9a5cdc5-c6f0-45fb-a001-fcdd695eb925
spec:
progressDeadlineSeconds: 600
replicas: 6
revisionHistoryLimit: 10
selector:
matchLabels:
app: nginx
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 25%
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
app: nginx
spec:
containers:
- image: nginx:1.18
imagePullPolicy: IfNotPresent
name: nginx
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
resources:
limits:
cpu: 1
memory: 1Gi
requests:
cpu: 500m
memory: 512Mi
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
創(chuàng)建完畢后,發(fā)現(xiàn)只有四個pod,配額有效:
[root@node4 yaml]# k get po
NAME READY STATUS RESTARTS AGE
nginx-54f9858f64-g65pk 1/1 Running 0 4m50s
nginx-54f9858f64-h42vf 1/1 Running 0 4m50s
nginx-54f9858f64-s776t 1/1 Running 0 4m50s
nginx-54f9858f64-wl7wz 1/1 Running 0 4m50s
那么,還有兩個pod呢?文章來源:http://www.zghlxwxcb.cn/news/detail-774521.html
[root@node4 yaml]# k get deployments.apps nginx -oyaml |grep message
message: Deployment does not have minimum availability.
message: 'pods "nginx-54f9858f64-p8rxf" is forbidden: exceeded quota: quota, requested:
message: ReplicaSet "nginx-54f9858f64" is progressing.
那么解決的方法也很簡單,也就是調(diào)整quota啦,怎么調(diào)整就不在這里廢話了吧!?。。。。。。?!~~~~~~文章來源地址http://www.zghlxwxcb.cn/news/detail-774521.html
到了這里,關(guān)于kubernetes|云原生|Deployment does not have minimum availability 的解決方案(資源隱藏的由來)的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!