1. 問題描述
Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/calico/bird.ctl: connect: connection refused
Readiness probe failed: 2023-05-04 22:13:23.706 [INFO][224] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 192.168.0.145,192.168.0.233,172.26.32.235
2. 環(huán)境信息
組件 | 版本 |
---|---|
Kubernetes | v1.24.2 |
Containerd | 1.6.18 |
Linux Kernel | 5.4 |
3. 問題分析
3.1 定位原因
發(fā)現(xiàn) Kubernetes 容器集群中有一個節(jié)點(diǎn)出現(xiàn) calico-node
異常的情況,查看該 Pod 的描述信息:
kubectl describe pod calico-node-hd7wm -n kube-system
提示 calico/node
連接 BIRDv4 socket 被拒絕。有網(wǎng)友反映是 calico 配置參數(shù) IP_AUTODETECTION_METHOD
的值需要設(shè)置為實(shí)際網(wǎng)卡的網(wǎng)卡名稱,于是檢查配置:
- name: CLUSTER_TYPE
value: "k8s,bgp"
# Auto-detect the BGP IP address.
- name: IP
value: "autodetect"
- name: IP_AUTODETECTION_METHOD
value: "interface=eth0"
發(fā)現(xiàn) calico 的配置已經(jīng)是實(shí)際的網(wǎng)卡名稱,網(wǎng)卡信息如下:
eth0: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1500
inet 192.168.0.200 netmask 255.255.255.0 broadcast 192.168.0.255
ether fa:16:3e:e9:41:0a txqueuelen 1000 (Ethernet)
RX packets 951363626 bytes 577280343840 (537.6 GiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 967287474 bytes 178201446365 (165.9 GiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
查看 calico-node 在節(jié)點(diǎn)上的 bird 進(jìn)程,發(fā)現(xiàn) calico-node 在節(jié)點(diǎn)上的進(jìn)程已經(jīng)啟動,于是猜測可能是這個進(jìn)程已經(jīng)假死。關(guān)于 bird 進(jìn)程的更多信息請參考:基于 BGP 實(shí)現(xiàn) Calico 的 IPIP 網(wǎng)絡(luò)
[root@k8s-master1 cni]# netstat -ltnp
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:179 0.0.0.0:* LISTEN 2246613/bird
......
3.2 解決辦法
- 干掉出問題的節(jié)點(diǎn)上 bird 進(jìn)程,讓
calico-node
自動重啟一個新的 bird 進(jìn)程。bird 進(jìn)程號如上所示是:2246613
kill -9 2246613
- 刪除問題節(jié)點(diǎn)上的
calico-node
Pod
kubectl delete pod calico-node-hd7wm -n kube-system
4. 結(jié)論
查看 calico-node 運(yùn)行狀態(tài)
kubectl get pods -A
calico-node 運(yùn)行信息如下:
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-9zhv2 1/1 Running 5 (53d ago) 76d
kube-system calico-node-dnvlc 1/1 Running 0 4m1s
kube-system calico-node-pt9qp 1/1 Running 0 56d
kube-system calico-node-wzq2p 1/1 Running 0 56d
......
此時 calico-node
已經(jīng)全部正常,剛才出問題的節(jié)點(diǎn)已經(jīng)處于 Running
狀態(tài)。查看之前出問題的節(jié)點(diǎn)上的 bird
進(jìn)程狀態(tài)
netstat -ltnp | grep bird
bird
進(jìn)行信息如下:文章來源:http://www.zghlxwxcb.cn/news/detail-434177.html
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State PID/Program name
tcp 0 0 0.0.0.0:179 0.0.0.0:* LISTEN 2253102/bird
......
bird
進(jìn)程已經(jīng)重新創(chuàng)建,新的進(jìn)程號是 2253102。通過 kill bird 假死進(jìn)程,重新生成新的 bird
進(jìn)程解決了上述問題。文章來源地址http://www.zghlxwxcb.cn/news/detail-434177.html
到了這里,關(guān)于Kubernetes 集群中某個節(jié)點(diǎn)出現(xiàn) Error querying BIRD: unable to connect to BIRDv4 socket的文章就介紹完了。如果您還想了解更多內(nèi)容,請在右上角搜索TOY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!