一、背景
原本consul集群是由三個server節(jié)點搭建的,購買的是三個ecs服務(wù)器,
java服務(wù)在注冊到consul的時候,隨便選擇其中一個節(jié)點。
從上圖可以看出, consul-01有28個服務(wù)注冊,而consul-02有94個服務(wù),consul-03則是29個。
有一次發(fā)生consul集群故障,某個conusl節(jié)點掛了,導(dǎo)致整個的服務(wù)發(fā)現(xiàn)都不通。
所以,我們需要進一步提高consul的高可用。
二、consul集群的高可用方案
consul server有三個節(jié)點,保持不變,仍舊部署在ecs。
在k8s各個節(jié)點都部署一個consul agent,同一個節(jié)點上的pod服務(wù)依賴于當(dāng)前節(jié)點的consul agent。
我這里的k8s容器共有11個節(jié)點,所以會再部署11個consul agent。
三、部署總結(jié)
1、ConfigMap掛載配置文件consul.json
apiVersion: v1
kind: ConfigMap
metadata:
name: consul-client
namespace: consul
data:
consul.json: |-
{
"datacenter": "aly-consul",
"addresses" : {
"http": "0.0.0.0",
"dns": "0.0.0.0"
},
"bind_addr": "{{ GetInterfaceIP \"eth0\" }}",
"client_addr": "0.0.0.0",
"data_dir": "/consul/data",
"rejoin_after_leave": true,
"retry_join": ["10.16.190.29", "10.16.190.28", "10.16.190.30"],
"verify_incoming": false,
"verify_outgoing": false,
"disable_remote_exec": false,
"encrypt":"Qy3w6MjoXVOMvOMSkuj43+buObaHLS8p4JONwvH0RUg=",
"encrypt_verify_incoming": true,
"encrypt_verify_outgoing": true,
"acl": {
"enabled": true,
"default_policy": "deny",
"down_policy": "extend-cache",
"tokens" :{ "agent": "xxxx-xx-xx-xx-xxxx" }
}
}
這里有幾個地方需要額外注意:
- datacenter和consul server所在的dc保持一致
- bind_addr 采用讀取的方式,不能寫死
- acl 填寫consul集群的訪問token
2、DaemonSet守護進程集
這里采用守護進程來實現(xiàn),在各個節(jié)點只部署一個consul agent。DaemonSet的典型使用場景:除本案例外,還適用于日志和監(jiān)控等等。你只需要在每個節(jié)點僅部署一個filebeat和node-exporter,大大節(jié)約了資源。
可以看到容器組共11個。
apiVersion: apps/v1
kind: DaemonSet
metadata:
name: consul-client
namespace: consul
labels:
app: consul
environment: prod
component: client
spec:
minReadySeconds: 60
revisionHistoryLimit: 10
selector:
matchLabels:
app: consul
environment: prod
commponent: client
updateStrategy:
rollingUpdate:
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
namespace: consul
labels:
app: consul
environment: prod
commponent: client
spec:
containers:
- env:
- name: POD_NAMESPACE
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: metadata.namespace
- name: HOST_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
name: consul-client
image: consul:1.6.2
imagePullPolicy: IfNotPresent
command:
- "consul"
- "agent"
- "-config-dir=/consul/config"
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- |
# 最大嘗試次數(shù)
max_attempts=30
# 每次嘗試之間的等待時間(秒)
wait_seconds=2
attempt=1
while [ $attempt -le $max_attempts ]; do
echo "Checking if Consul is ready (attempt: $attempt)..."
# 使用curl檢查Consul的健康狀態(tài)
if curl -s http://127.0.0.1:8500/v1/agent/self > /dev/null; then
echo "Consul is ready."
# 執(zhí)行reload操作
consul reload
exit 0
else
echo "Consul is not ready yet."
fi
# 等待指定的時間后再次嘗試
sleep $wait_seconds
attempt=$((attempt + 1))
done
echo "Consul did not become ready in time."
exit 1
preStop:
exec:
command:
- /bin/sh
- -c
- consul leave
ports:
- name: http-api
hostPort: 8500
containerPort: 8500
protocol: TCP
- name: dns-tcp
hostPort: 8600
containerPort: 8600
protocol: TCP
- name: dns-udp
hostPort: 8600
containerPort: 8600
protocol: UDP
- name: server-rpc
hostPort: 8300
containerPort: 8300
protocol: TCP
- name: serf-lan-tcp
hostPort: 8301
containerPort: 8301
protocol: TCP
- name: serf-lan-udp
hostPort: 8301
containerPort: 8301
protocol: UDP
- name: serf-wan-tcp
hostPort: 8302
containerPort: 8302
protocol: TCP
- name: serf-wan-udp
hostPort: 8302
containerPort: 8302
protocol: UDP
volumeMounts:
- name: consul-config
mountPath: /consul/config/consul.json
subPath: consul.json
- name: consul-data-dir
mountPath: /consul/data
- name: localtime
mountPath: /etc/localtime
livenessProbe:
tcpSocket:
port: 8500
initialDelaySeconds: 30
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 1
readinessProbe:
httpGet:
path: /v1/status/leader
port: 8500
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
timeoutSeconds: 1
resources:
requests:
memory: "256Mi"
cpu: "200m"
limits:
memory: "1024Mi"
cpu: "1000m"
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
hostNetwork: true
dnsPolicy: ClusterFirst
restartPolicy: Always
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
volumes:
- name: consul-config
configMap:
name: consul-client
items:
- key: consul.json
path: consul.json
- name: consul-data-dir
hostPath:
path: /data/consul/data
type: DirectoryOrCreate
- name: localtime
hostPath:
path: /etc/localtime
type: File
四、踩過的坑
1、報錯 ==> Multiple private IPv4 addresses found. Please configure one with ‘bind’ and/or ‘a(chǎn)dvertise’.
解決辦法:
修改consul.json中的"bind_addr": “{{ GetInterfaceIP “eth0” }}”
2、lifecycle,在pod創(chuàng)建成功后,執(zhí)行consul reload失敗,導(dǎo)致consul pod節(jié)點反復(fù)重啟。
進一步描述錯誤現(xiàn)象,consul agent加入到consul集群,但是一會兒容器又掛了,導(dǎo)致又退出consul集群。
我嘗試去掉健康檢測,沒能解決,容器還是會掛。
所以不是健康檢測的問題。
lifecycle:
postStart:
exec:
command:
- /bin/sh
- -c
- consul reload
preStop:
exec:
command:
- /bin/sh
- -c
- consul leave
后面修改為重試consul reload。
# 最大嘗試次數(shù)
max_attempts=30
# 每次嘗試之間的等待時間(秒)
wait_seconds=2
attempt=1
while [ $attempt -le $max_attempts ]; do
echo "Checking if Consul is ready (attempt: $attempt)..."
# 使用curl檢查Consul的健康狀態(tài)
if curl -s http://127.0.0.1:8500/v1/agent/self > /dev/null; then
echo "Consul is ready."
# 執(zhí)行reload操作
consul reload
exit 0
else
echo "Consul is not ready yet."
fi
# 等待指定的時間后再次嘗試
sleep $wait_seconds
attempt=$((attempt + 1))
done
echo "Consul did not become ready in time."
exit 1
3、忽略錯誤 [ERR] agent: failed to sync remote state: No known Consul servers,這個屬于正?,F(xiàn)象。
不要被這個錯誤日志給誤導(dǎo)了,你只要保證后文有類似日志。
2024/02/21 11:09:42 [INFO] consul: adding server aly-consul-02 (Addr: tcp/10.16.190.29:8300) (DC: aly-consul)
4、別忘記了配置acl,否則consul agent是無法加入到consul server集群的。
"acl": {
"enabled": true,
"default_policy": "deny",
"down_policy": "extend-cache",
"tokens" : {
"agent": "xxxx-xx-xx-xx-xxxx"
}
}
五、服務(wù)接入示例
在deployment.yaml中增加環(huán)境變量。
containers:
- env:
- name: HOST_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.hostIP
- name: POD_IP
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: status.podIP
- name: CONFIG_SERVICE_ENABLED
value: 'true'
- name: CONFIG_SERVICE_PORT
value: 8500
- name: CONFIG_SERVICE_HOST
value: $(HOST_IP)
spring boot工程的bootstrap.yml配置見下
spring:
application:
name: xxx-service
cloud:
consul:
enabled: ${CONFIG_SERVICE_ENABLED}
host: ${CONFIG_SERVICE_HOST}
port: ${CONFIG_SERVICE_PORT}
discovery:
prefer-ip-address: true
enabled: ${CONFIG_SERVICE_ENABLED}
config:
format: yaml
enabled: ${CONFIG_SERVICE_ENABLED}
六、總結(jié)
本文是以java服務(wù)注冊到consul為示例,給你講述了從部署到接入的全過程。
consul地址,這里填寫的是ip地址,而非域名。
也就是說,與服務(wù)在同一個節(jié)點上的consul如果掛了,也就導(dǎo)致服務(wù)啟動的時候,會注冊失敗。
有人會想,可不可以像nginx反向代理后端服務(wù)那樣,由k8s service或者nginx反向consul集群呢?
如下圖所示:
不建議你這樣去做,因為consul作為服務(wù)注冊中心,有一個基本要求是:
java服務(wù)是從哪個consul注冊的服務(wù)就要從那個consul注銷。
所以負(fù)載均衡對后端consul agent就會要求有狀態(tài),而非無狀態(tài)。文章來源:http://www.zghlxwxcb.cn/news/detail-851211.html
好了, consul的高可用部署就寫到這里。文章來源地址http://www.zghlxwxcb.cn/news/detail-851211.html
到了這里,關(guān)于阿里云k8s容器部署consul集群的高可用方案的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!