1. 環(huán)境:
ceph:octopus
OS:Kylin-Server-V10_U1-Release-Build02-20210824-GFB-x86_64、CentOS Linux release 7.9.2009
2. ceph和cephadm
2.1 ceph簡(jiǎn)介
Ceph可用于向云平臺(tái)提供對(duì)象存儲(chǔ)、塊設(shè)備服務(wù)和文件系統(tǒng)。所有Ceph存儲(chǔ)集群部署都從設(shè)置每個(gè)Ceph節(jié)點(diǎn)開始,然后設(shè)置網(wǎng)絡(luò)。
Ceph存儲(chǔ)集群要求:至少有一個(gè)Ceph Monitor和一個(gè)Ceph Manager,并且至少有與Ceph集群上存儲(chǔ)的對(duì)象副本一樣多的Ceph osd(例如,如果一個(gè)給定對(duì)象的三個(gè)副本存儲(chǔ)在Ceph集群上,那么該Ceph集群中必須至少存在三個(gè)osd)。
Monitors:Ceph監(jiān)視器(ceph-mon)維護(hù)集群狀態(tài)的映射,包括監(jiān)視器映射、管理器映射、OSD映射、MDS映射和CRUSH映射。這些映射是Ceph守護(hù)進(jìn)程相互協(xié)調(diào)所需的關(guān)鍵集群狀態(tài)。監(jiān)視器還負(fù)責(zé)管理守護(hù)進(jìn)程和客戶端之間的身份驗(yàn)證。通常需要至少三個(gè)監(jiān)視器來實(shí)現(xiàn)冗余和高可用性。
Managers:Ceph Manager守護(hù)進(jìn)程(ceph-mgr)負(fù)責(zé)跟蹤C(jī)eph集群的運(yùn)行時(shí)指標(biāo)和當(dāng)前狀態(tài),包括存儲(chǔ)利用率、當(dāng)前性能指標(biāo)和系統(tǒng)負(fù)載。Ceph Manager守護(hù)進(jìn)程還承載基于python的模塊來管理和公開Ceph集群信息,包括基于web的Ceph Dashboard和REST API。通常需要至少兩個(gè)管理器來實(shí)現(xiàn)高可用性。
Ceph OSD:對(duì)象存儲(chǔ)守護(hù)進(jìn)程(Ceph OSD, ceph-osd)存儲(chǔ)數(shù)據(jù),處理數(shù)據(jù)復(fù)制、恢復(fù)、再平衡,并通過檢查其他Ceph OSD守護(hù)進(jìn)程的心跳,為Ceph監(jiān)視器和管理器提供一些監(jiān)控信息。通常需要至少三個(gè)Ceph osd來實(shí)現(xiàn)冗余和高可用性。
MDSs: Ceph元數(shù)據(jù)服務(wù)器(MDS,ceph-mds)代表Ceph文件系統(tǒng)存儲(chǔ)元數(shù)據(jù)(即,Ceph塊設(shè)備和Ceph對(duì)象存儲(chǔ)不使用MDS)。Ceph元數(shù)據(jù)服務(wù)器允許POSIX文件系統(tǒng)用戶執(zhí)行基本命令(如ls、find等),而不會(huì)給Ceph存儲(chǔ)集群帶來巨大的負(fù)擔(dān)。
參考官方文檔ceph
2.2 ceph releases
2.3 cephadm
cephadm是一個(gè)用于管理Ceph集群的實(shí)用程序。
- cephadm可以向集群中添加Ceph容器。
- cephadm可以從集群中刪除Ceph容器。
- cephadm可以更新Ceph容器。
cephadm不依賴于外部配置工具,如Ansible、Rook或Salt。但是,這些外部配置工具可以用于自動(dòng)化不是由cephadm本身執(zhí)行的操作。
cephadm管理Ceph集群的整個(gè)生命周期。這個(gè)生命周期從引導(dǎo)過程開始,當(dāng)cephadm在單個(gè)節(jié)點(diǎn)上創(chuàng)建一個(gè)小型Ceph集群時(shí)。該集群由一個(gè)監(jiān)視器和一個(gè)管理器組成。然后,cephadm使用編排接口擴(kuò)展集群,添加主機(jī)并提供Ceph守護(hù)進(jìn)程和服務(wù)。這個(gè)生命周期的管理可以通過Ceph命令行界面(CLI)或儀表板(GUI)來執(zhí)行。
參考官方文檔cephadm
3. 節(jié)點(diǎn)規(guī)劃
主機(jī)名 | 地址 | 角色 |
---|---|---|
ceph1 | 172.25.0.141 | mon, osd, mds, mgr, iscsi, cephadm |
ceph2 | 172.25.0.142 | mon, osd, mds, mgr, iscsi |
ceph3 | 172.25.0.143 | mon, osd, mds, mgr, iscsi |
4. 基礎(chǔ)環(huán)境配置
4.1 地址配置
ceph1配置地址
nmcli connection modify ens33 ipv4.method manual ipv4.addresses 172.25.0.141/24 ipv4.gateway 172.25.0.2 connection.autoconnect yes
ceph2配置地址
nmcli connection modify ens33 ipv4.method manual ipv4.addresses 172.25.0.142/24 ipv4.gateway 172.25.0.2 connection.autoconnect yes
ceph3配置地址
nmcli connection modify ens33 ipv4.method manual ipv4.addresses 172.25.0.143/24 ipv4.gateway 172.25.0.2 connection.autoconnect yes
4.2 主機(jī)名配置
ceph1配置主機(jī)名
hostnamectl set-hostname ceph1
ceph2配置主機(jī)名
hostnamectl set-hostname ceph2
ceph3配置主機(jī)名
hostnamectl set-hostname ceph3
ceph1、ceph2、ceph3均配置hosts
vim /etc/hosts
127.0.0.1 localhost localhost.localdomain localhost4 localhost4.localdomain4
::1 localhost localhost.localdomain localhost6 localhost6.localdomain6
172.25.0.141 ceph1
172.25.0.142 ceph2
172.25.0.143 ceph3
添加DNS解析,任意添加一個(gè)
vim /etc/resolv.conf
nameserver 223.5.5.5
4.3 防火墻
ceph1、ceph2、ceph3均需配置
關(guān)閉防火墻
systemctl disable --now firewalld
setenforce 0
sed -i 's/^SELINUX=.*/SELINUX=disabled/' /etc/selinux/config
4.4 配置免密
ssh-keygen -f /root/.ssh/id_rsa -P ''
ssh-copy-id -o StrictHostKeyChecking=no 172.25.0.141
ssh-copy-id -o StrictHostKeyChecking=no 172.25.0.142
ssh-copy-id -o StrictHostKeyChecking=no 172.25.0.143
4.5 配置時(shí)間同步
ceph1、ceph2、ceph3均需安裝軟件包
yum -y install chrony
systemctl enable chronyd
ceph1作為服務(wù)端
vim /etc/chrony.conf
pool pool.ntp.org iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
allow 172.25.0.0/24
local stratum 10
keyfile /etc/chrony.keys
leapsectz right/UTC
logdir /var/log/chrony
重啟服務(wù)
systemctl restart chronyd
ceph2、ceph2及其他后續(xù)節(jié)點(diǎn)作為客戶端
vim /etc/chrony.conf
pool 172.25.0.141 iburst
driftfile /var/lib/chrony/drift
makestep 1.0 3
rtcsync
keyfile /etc/chrony.keys
leapsectz right/UTC
logdir /var/log/chrony
重啟服務(wù)
systemctl restart chronyd
使用客戶端進(jìn)行驗(yàn)證
chronyc sources -v
210 Number of sources = 1
.-- Source mode '^' = server, '=' = peer, '#' = local clock.
/ .- Source state '*' = current synced, '+' = combined , '-' = not combined,
| / '?' = unreachable, 'x' = time may be in error, '~' = time too variable.
|| .- xxxx [ yyyy ] +/- zzzz
|| Reachability register (octal) -. | xxxx = adjusted offset,
|| Log2(Polling interval) --. | | yyyy = measured offset,
|| \ | | zzzz = estimated error.
|| | | \
MS Name/IP address Stratum Poll Reach LastRx Last sample
===============================================================================
^* ceph1 11 6 17 10 -68us[ -78us] +/- 2281us
4.6 安裝python3
4.6.1 Kylin V10
系統(tǒng)已經(jīng)默認(rèn)安裝了python 3.7.4的版本,若未安裝,配置好源后通過YUM安裝
yum -y install python3
4.6.2 CentOS 7
安裝python3
yum -y install epel-release
yum -y install python3
4.6 安裝配置docker
4.6.1 Kylin V10
系統(tǒng)已經(jīng)默認(rèn)安裝了docker-engine的版本,若未安裝,配置好源后通過YUM安裝
yum -y install docker-engine
4.6.2 CentOS 7
配置docker repo
yum -y install yum-utils
yum-config-manager --add-repo https://download.docker.com/linux/centos/docker-ce.repo
yum -y install docker-ce
設(shè)置開機(jī)自啟
systemctl enable docker
5. 安裝cephadm
5.1 Kylin V10
下載cephadm軟件包
wget http://mirrors.163.com/ceph/rpm-octopus/el7/noarch/cephadm-15.2.17-0.el7.noarch.rpm
安裝cephadm
chmod 600 /var/log/tallylog
rpm -ivh cephadm-15.2.17-0.el7.noarch.rpm
5.2 CentOS 7
通過cephadm腳本授予執(zhí)行權(quán)限
curl https://raw.githubusercontent.com/ceph/ceph/v15.2.1/src/cephadm/cephadm -o cephadm
chmod +x cephadm
或者
wget http://mirrors.163.com/ceph/rpm-octopus/el7/noarch/cephadm
基于發(fā)行版的名稱配置ceph倉(cāng)庫
./cephadm add-repo --release octopus
執(zhí)行cephadm安裝程序
./cephadm install
安裝ceph-common軟件包
cephadm install ceph-common
octopus軟件包地址:
https://repo.huaweicloud.com/ceph/rpm-octopus/
http://mirrors.163.com/ceph/rpm-octopus/
http://mirrors.aliyun.com/ceph/rpm-octopus/
5.3 cephadm修改
離線使用本地的鏡像,需要修改cephadm文件_pull_image函數(shù)的cmd列表中的pull,將其修改為images。
vim /usr/sbin/cephadm
6. 安裝ceph
6.1 準(zhǔn)備鏡像
6.1.1 導(dǎo)入鏡像
離線鏡像已放置在百度云盤:
鏈接:https://pan.baidu.com/s/1UEkQo0XrwuCDI5u9H8sGkQ?pwd=zsd4
提取碼:zsd4
離線load導(dǎo)入鏡像
docker load < ceph-v15.img
docker load < prometheus-v2.18.1.img
docker load < node-exporter-v0.18.1.img
docker load < ceph-grafana-6.7.4.img
docker load < alertmanager-v0.20.0.img
在線pull鏡像octopus
docker pull quay.io/ceph/ceph:v15
docker pull quay.io/prometheus/prometheus:v2.18.1
docker pull quay.io/prometheus/node-exporter:v0.18.1
docker pull quay.io/ceph/ceph-grafana:6.7.4
docker pull quay.io/prometheus/alertmanager:v0.20.0
quincy
docker pull quay.io/ceph/ceph:v17
docker pull quay.io/ceph/ceph-grafana:8.3.5
docker pull quay.io/prometheus/prometheus:v2.33.4
docker pull quay.io/prometheus/node-exporter:v1.3.1
docker pull quay.io/prometheus/alertmanager:v0.23.0
6.1.2 構(gòu)建registryserver
docker load < registry-2.img
mkdir -p /data/registry/
docker run -d -p 4000:5000 -v /data/registry/:/var/lib/registry/ --restart=always --name registry registry:2
添加主機(jī)名解析
vim /etc/hosts
172.25.0.141 registryserver
注意:這個(gè)主機(jī)名簡(jiǎn)析需要放到第一行,便于cephadm shell使用
6.1.3 tag鏡像及push進(jìn)倉(cāng)庫
docker tag quay.io/ceph/ceph:v15 registryserver:4000/ceph/ceph:v15
docker tag quay.io/prometheus/prometheus:v2.18.1 registryserver:4000/prometheus/prometheus:v2.18.1
docker tag quay.io/prometheus/node-exporter:v0.18.1 registryserver:4000/prometheus/node-exporter:v0.18.1
docker tag quay.io/ceph/ceph-grafana:6.7.4 registryserver:4000/ceph/ceph-grafana:6.7.4
docker tag quay.io/prometheus/alertmanager:v0.20.0 registryserver:4000/prometheus/alertmanager:v0.20.0
docker push registryserver:4000/ceph/ceph:v15
docker push registryserver:4000/prometheus/prometheus:v2.18.1
docker push registryserver:4000/prometheus/node-exporter:v0.18.1
docker push registryserver:4000/ceph/ceph-grafana:6.7.4
docker push registryserver:4000/prometheus/alertmanager:v0.20.0
6.2 bootstrap部署
mkdir -p /etc/ceph
cephadm bootstrap --mon-ip 172.25.0.141
該命令執(zhí)行以下操作:
- 在本地主機(jī)上為新集群創(chuàng)建monitor 和 manager daemon守護(hù)程序。
- 為Ceph集群生成一個(gè)新的SSH密鑰,并將其添加到root用戶的/root/.ssh/authorized_keys文件中。
- 將與新群集進(jìn)行通信所需的最小配置文件保存到/etc/ceph/ceph.conf。
- 向/etc/ceph/ceph.client.admin.keyring寫入client.admin管理(特權(quán)!)secret key的副本。
- 將public key的副本寫入/etc/ceph/ceph.pub。
[root@ceph1 ~]# cephadm bootstrap --mon-ip 172.25.0.141
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chronyd.service is enabled and running
Repeating the final host check...
podman|docker (/usr/bin/docker) is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
Cluster fsid: d20e3700-2d2f-11ee-9166-000c29aa07d2
Verifying IP 172.25.0.141 port 3300 ...
Verifying IP 172.25.0.141 port 6789 ...
Mon IP 172.25.0.141 is in CIDR network 172.25.0.0/24
Pulling container image quay.io/ceph/ceph:v15...
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
Waiting for mon to start...
Waiting for mon...
mon is available
Assimilating anything we can from ceph.conf...
Generating new minimal ceph.conf...
Restarting the monitor...
Setting mon public_network...
Creating mgr...
Verifying port 9283 ...
Wrote keyring to /etc/ceph/ceph.client.admin.keyring
Wrote config to /etc/ceph/ceph.conf
Waiting for mgr to start...
Waiting for mgr...
mgr not available, waiting (1/10)...
mgr not available, waiting (2/10)...
mgr not available, waiting (3/10)...
mgr not available, waiting (4/10)...
mgr not available, waiting (5/10)...
mgr not available, waiting (6/10)...
mgr not available, waiting (7/10)...
mgr is available
Enabling cephadm module...
Waiting for the mgr to restart...
Waiting for Mgr epoch 5...
Mgr epoch 5 is available
Setting orchestrator backend to cephadm...
Generating ssh key...
Wrote public SSH key to to /etc/ceph/ceph.pub
Adding key to root@localhost's authorized_keys...
Adding host ceph1...
Deploying mon service with default placement...
Deploying mgr service with default placement...
Deploying crash service with default placement...
Enabling mgr prometheus module...
Deploying prometheus service with default placement...
Deploying grafana service with default placement...
Deploying node-exporter service with default placement...
Deploying alertmanager service with default placement...
Enabling the dashboard module...
Waiting for the mgr to restart...
Waiting for Mgr epoch 13...
Mgr epoch 13 is available
Generating a dashboard self-signed certificate...
Creating initial admin user...
Fetching dashboard port number...
Ceph Dashboard is now available at:
URL: https://ceph1:8443/
User: admin
Password: cx763mtlk7
You can access the Ceph CLI with:
sudo /usr/sbin/cephadm shell --fsid d20e3700-2d2f-11ee-9166-000c29aa07d2 -c /etc/ceph/ceph.conf -k /etc/c
Please consider enabling telemetry to help improve Ceph:
ceph telemetry on
For more information see:
https://docs.ceph.com/docs/master/mgr/telemetry/
Bootstrap complete.
根據(jù)上文輸出的URL、User、Password登錄網(wǎng)頁,如下
首次登錄需要修改密碼,修改密碼之后登錄如下:
通過cephadm shell啟用ceph命令,也可以通過創(chuàng)建別名:
alias ceph='cephadm shell -- ceph'
而后直接在物理機(jī)上執(zhí)行ceph -s
6.2 主機(jī)管理
6.2.1 查看可用主機(jī)
ceph orch host ls
[ceph: root@ceph1 /]# ceph orch host ls
HOST ADDR LABELS STATUS
ceph1 ceph1
6.2.1 添加主機(jī)
將主機(jī)添加到集群中
ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph2
ssh-copy-id -f -i /etc/ceph/ceph.pub root@ceph3
scp -r /etc/ceph root@ceph2:/etc/
scp -r /etc/ceph root@ceph3:/etc/
ceph orch host add ceph1 172.25.0.141
ceph orch host add ceph2 172.25.0.142
ceph orch host add ceph3 172.25.0.143
ceph orch host ls
[ceph: root@ceph1 /]# ceph orch host ls
HOST ADDR LABELS STATUS
ceph1 172.25.0.141
ceph2 172.25.0.142
ceph3 172.25.0.143
6.2.3 刪除主機(jī)
從 環(huán)境中刪除主機(jī),如ceph3,確保正在運(yùn)行的服務(wù)都已經(jīng)停止和刪除:
ceph orch host rm ceph3
6.3 部署MONS和MGRS
6.3.1 mon指定特定子網(wǎng)
ceph config set mon public_network 172.25.0.0/24
6.3.2 更改默認(rèn)mon的數(shù)量
通過ceph orch ls 查看到相應(yīng)服務(wù)的狀態(tài),如mon
[ceph: root@ceph1 /]# ceph orch ls
NAME RUNNING REFRESHED AGE PLACEMENT IMAGE NAME IMAGE ID
alertmanager 1/1 9m ago 6h count:1 quay.io/prometheus/alertmanager:v0.20.0 0881eb8f169f
crash 2/3 9m ago 6h * quay.io/ceph/ceph:v15 93146564743f
grafana 1/1 9m ago 6h count:1 quay.io/ceph/ceph-grafana:6.7.4 557c83e11646
mgr 1/2 9m ago 6h count:2 quay.io/ceph/ceph:v15 93146564743f
mon 2/5 9m ago 6h count:5 quay.io/ceph/ceph:v15 93146564743f
node-exporter 1/3 10m ago 6h * quay.io/prometheus/node-exporter:v0.18.1 mix
osd.None 3/0 9m ago - <unmanaged> quay.io/ceph/ceph:v15 93146564743f
prometheus 1/1 9m ago 6h count:1 quay.io/prometheus/prometheus:v2.18.1 de242295e225
ceph集群一般默認(rèn)會(huì)允許存在5個(gè)mon和2個(gè)mgr,將mon調(diào)整為3,執(zhí)行如下命令:
ceph orch apply mon 3
[ceph: root@ceph1 /]# ceph orch ls
NAME RUNNING REFRESHED AGE PLACEMENT IMAGE NAME IMAGE ID
alertmanager 1/1 35s ago 6h count:1 quay.io/prometheus/alertmanager:v0.20.0 0881eb8f169f
crash 3/3 39s ago 6h * quay.io/ceph/ceph:v15 93146564743f
grafana 1/1 35s ago 6h count:1 quay.io/ceph/ceph-grafana:6.7.4 557c83e11646
mgr 2/2 39s ago 6h count:2 quay.io/ceph/ceph:v15 93146564743f
mon 3/3 39s ago 11m ceph1;ceph2;ceph3;count:3 quay.io/ceph/ceph:v15 93146564743f
node-exporter 1/3 39s ago 6h * quay.io/prometheus/node-exporter:v0.18.1 mix
osd.None 3/0 35s ago - <unmanaged> quay.io/ceph/ceph:v15 93146564743f
prometheus 1/1 35s ago 6h count:1 quay.io/prometheus/prometheus:v2.18.1 de242295e225
6.3.3 部署mon和mgr
在特定的主機(jī)上部署mon
ceph orch apply mon --placement="3 ceph1 ceph2 ceph3"
在特定的主機(jī)上部署mgr
ceph orch apply mgr --placement="3 ceph1 ceph2 ceph3"
6.4 OSD部署
6.4.1 查看設(shè)備列表
執(zhí)行命令ceph orch device ls查看群集主機(jī)上的存儲(chǔ)設(shè)備清單可以顯示為:
[ceph: root@ceph1 /]# ceph orch device ls
Hostname Path Type Serial Size Health Ident Fault Available
ceph1 /dev/sdb hdd 21.4G Unknown N/A N/A Yes
ceph1 /dev/sdc hdd 21.4G Unknown N/A N/A Yes
ceph1 /dev/sdd hdd 21.4G Unknown N/A N/A Yes
ceph2 /dev/sdb hdd 21.4G Unknown N/A N/A Yes
ceph2 /dev/sdc hdd 21.4G Unknown N/A N/A Yes
ceph2 /dev/sdd hdd 21.4G Unknown N/A N/A Yes
ceph3 /dev/sdb hdd 21.4G Unknown N/A N/A Yes
ceph3 /dev/sdc hdd 21.4G Unknown N/A N/A Yes
ceph3 /dev/sdd hdd 21.4G Unknown N/A N/A Yes
6.4.2 創(chuàng)建OSD
如果滿足以下所有條件,則認(rèn)為存儲(chǔ)設(shè)備可用:
- 設(shè)備必須沒有分區(qū)。
- 設(shè)備不得具有任何LVM狀態(tài)。
- 不得安裝設(shè)備。
- 該設(shè)備不得包含文件系統(tǒng)。
- 該設(shè)備不得包含Ceph BlueStore OSD。
- 設(shè)備必須大于5 GB。
Ceph拒絕在不可用的設(shè)備上配置OSD。
Ceph使用任何可用和未使用的存儲(chǔ)設(shè)備,如下:
ceph orch apply osd --all-available-devices
在特定主機(jī)上的特定設(shè)備創(chuàng)建OSD,
ceph1主機(jī)添加為OSD
ceph orch daemon add osd ceph1:/dev/sdb
ceph orch daemon add osd ceph1:/dev/sdc
ceph orch daemon add osd ceph1:/dev/sdd
ceph2主機(jī)添加為OSD
ceph orch daemon add osd ceph2:/dev/sdb
ceph orch daemon add osd ceph2:/dev/sdc
ceph orch daemon add osd ceph2:/dev/sdd
ceph3主機(jī)添加為OSD
ceph orch daemon add osd ceph3:/dev/sdb
ceph orch daemon add osd ceph3:/dev/sdc
ceph orch daemon add osd ceph3:/dev/sdd
6.4.3 移除OSD
ceph orch osd rm 3
6.4.4啟停(osd)服務(wù)
ceph orch daemon start/stop/restart osd.3
6.5 RGW部署
6.5.1 創(chuàng)建一個(gè)領(lǐng)域
首先創(chuàng)建一個(gè)領(lǐng)域
radosgw-admin realm create --rgw-realm=radosgw --default
執(zhí)行結(jié)果如下:
[ceph: root@ceph1 /]# radosgw-admin realm create --rgw-realm=radosgw --default
{
"id": "c4b75303-5ef6-4d82-a60c-efa4ceea2bc2",
"name": "radosgw",
"current_period": "d4267c01-b762-473b-bcf5-f278b1ea608a",
"epoch": 1
}
6.5.2 創(chuàng)建區(qū)域組
創(chuàng)建區(qū)域組
radosgw-admin zonegroup create --rgw-zonegroup=default --master --default
執(zhí)行結(jié)果如下:
[ceph: root@ceph1 /]# radosgw-admin zonegroup create --rgw-zonegroup=default --master --default
{
"id": "6a3dfbbf-3c09-4cc2-9644-1c17524ee4d1",
"name": "default",
"api_name": "default",
"is_master": "true",
"endpoints": [],
"hostnames": [],
"hostnames_s3website": [],
"master_zone": "",
"zones": [],
"placement_targets": [],
"default_placement": "",
"realm_id": "c4b75303-5ef6-4d82-a60c-efa4ceea2bc2",
"sync_policy": {
"groups": []
}
}
6.5.3 創(chuàng)建區(qū)域
創(chuàng)建區(qū)域
radosgw-admin zone create --rgw-zonegroup=default --rgw-zone=cn --master --default
執(zhí)行結(jié)果如下:
[ceph: root@ceph1 /]# radosgw-admin zone create --rgw-zonegroup=default --rgw-zone=cn --master --default
{
"id": "4531574f-a84b-464f-962b-7a29aad20080",
"name": "cn",
"domain_root": "cn.rgw.meta:root",
"control_pool": "cn.rgw.control",
"gc_pool": "cn.rgw.log:gc",
"lc_pool": "cn.rgw.log:lc",
"log_pool": "cn.rgw.log",
"intent_log_pool": "cn.rgw.log:intent",
"usage_log_pool": "cn.rgw.log:usage",
"roles_pool": "cn.rgw.meta:roles",
"reshard_pool": "cn.rgw.log:reshard",
"user_keys_pool": "cn.rgw.meta:users.keys",
"user_email_pool": "cn.rgw.meta:users.email",
"user_swift_pool": "cn.rgw.meta:users.swift",
"user_uid_pool": "cn.rgw.meta:users.uid",
"otp_pool": "cn.rgw.otp",
"system_key": {
"access_key": "",
"secret_key": ""
},
"placement_pools": [
{
"key": "default-placement",
"val": {
"index_pool": "cn.rgw.buckets.index",
"storage_classes": {
"STANDARD": {
"data_pool": "cn.rgw.buckets.data"
}
},
"data_extra_pool": "cn.rgw.buckets.non-ec",
"index_type": 0
}
}
],
"realm_id": "c4b75303-5ef6-4d82-a60c-efa4ceea2bc2"
}
6.5.4 部署radosgw
為特定領(lǐng)域和區(qū)域部署radosgw守護(hù)程序
ceph orch apply rgw radosgw cn --placement="3 ceph1 ceph2 ceph3"
6.5.5 驗(yàn)證各節(jié)點(diǎn)是否啟動(dòng)rgw容器
ceph orch ps --daemon-type rgw
執(zhí)行結(jié)果如下:
[ceph: root@ceph1 /]# ceph orch ps --daemon-type rgw
NAME HOST STATUS REFRESHED AGE VERSION IMAGE NAME IMAGE ID CONTAINER ID
rgw.radosgw.cn.ceph1.jzxfhf ceph1 running (48s) 38s ago 48s 15.2.17 quay.io/ceph/ceph:v15 93146564743f f2b23eac68b0
rgw.radosgw.cn.ceph2.hdbbtx ceph2 running (55s) 42s ago 55s 15.2.17 quay.io/ceph/ceph:v15 93146564743f 1a96776d7b0f
rgw.radosgw.cn.ceph3.arlfch ceph3 running (52s) 41s ago 52s 15.2.17 quay.io/ceph/ceph:v15 93146564743f ab23d7f7a46c
6.6 MDS部署
Ceph 文件系統(tǒng),名為 CephFS,兼容 POSIX,建立在 Ceph 對(duì)象存儲(chǔ)RADOS之上,相關(guān)資料參考cephfs
6.6.1 創(chuàng)建數(shù)據(jù)存儲(chǔ)池
創(chuàng)建一個(gè)用于cephfs數(shù)據(jù)存儲(chǔ)的池
ceph osd pool create cephfs_data
6.6.2 創(chuàng)建元數(shù)據(jù)存儲(chǔ)池
創(chuàng)建一個(gè)用于cephfs元數(shù)據(jù)存儲(chǔ)的池
ceph osd pool create cephfs_metadata
6.6.3 創(chuàng)建文件系統(tǒng)
創(chuàng)建一個(gè)文件系統(tǒng),名為cephfs:
ceph fs new cephfs cephfs_metadata cephfs_data
6.6.4 查看文件系統(tǒng)
查看文件系統(tǒng)列表
ceph fs ls
設(shè)置cephfs最大mds服務(wù)數(shù)量為3
ceph fs set cephfs max_mds 3
若mds只有一個(gè),則需要設(shè)置mds為1,副本數(shù)若不夠,size設(shè)置為1
ceph fs set cephfs max_mds 1
ceph osd pool set cephfs_metadata size 1
ceph osd pool set cephfs_data size 1
6.6.5 部署3個(gè)mds服務(wù)
在特定的主機(jī)上部署mds服務(wù)
ceph orch apply mds cephfs --placement="3 ceph1 ceph2 ceph3"
6.6.6 查看mds服務(wù)是否部署成功
ceph orch ps --daemon-type mds
執(zhí)行結(jié)果如下:
[ceph: root@ceph1 /]# ceph orch ps --daemon-type mds
NAME HOST STATUS REFRESHED AGE VERSION IMAGE NAME IMAGE ID CONTAINER ID
mds.cephfs.ceph1.gfvgda ceph1 running (10s) 4s ago 10s 15.2.17 quay.io/ceph/ceph:v15 93146564743f 7f216bfe781f
mds.cephfs.ceph2.ytkxqc ceph2 running (15s) 6s ago 15s 15.2.17 quay.io/ceph/ceph:v15 93146564743f dec713c46919
mds.cephfs.ceph3.tszetu ceph3 running (13s) 6s ago 13s 15.2.17 quay.io/ceph/ceph:v15 93146564743f 70ee41d1a81b
6.6.6 查看文件系統(tǒng)狀態(tài)
ceph mds stat
執(zhí)行結(jié)果如下:
cephfs:3 {0=cephfs.ceph2.ytkxqc=up:active,1=cephfs.ceph3.tszetu=up:active,2=cephfs.ceph1.gfvgda=up:active}
查下ceph狀態(tài)
[ceph: root@ceph1 /]# ceph -s
cluster:
id: d20e3700-2d2f-11ee-9166-000c29aa07d2
health: HEALTH_WARN
insufficient standby MDS daemons available
services:
mon: 3 daemons, quorum ceph1,ceph2,ceph3 (age 24h)
mgr: ceph1.kxhbab(active, since 45m), standbys: ceph2.rovomy, ceph3.aseinm
mds: cephfs:3 {0=cephfs.ceph2.ytkxqc=up:active,1=cephfs.ceph3.tszetu=up:active,2=cephfs.ceph1.gfvgda=up:active}
osd: 9 osds: 9 up (since 22h), 9 in (since 22h)
rgw: 3 daemons active (radowgw.cn.ceph1.jzxfhf, radowgw.cn.ceph2.hdbbtx, radowgw.cn.ceph3.arlfch)
task status:
data:
pools: 9 pools, 233 pgs
objects: 269 objects, 14 KiB
usage: 9.2 GiB used, 171 GiB / 180 GiB avail
pgs: 233 active+clean
io:
client: 1.2 KiB/s rd, 1 op/s rd, 0 op/s wr
6.6.7 文件系統(tǒng)掛載
通過admin掛載文件系統(tǒng)
mount.ceph ceph1:6789,ceph2:6789,ceph3:6789:/ /mnt/cephfs -o name=admin
創(chuàng)建用戶 cephfs,用于客戶端訪問CephFs
ceph auth get-or-create client.cephfs mon 'allow r' mds 'allow r, allow rw path=/' osd 'allow rw pool=cephfs_data' -o ceph.client.cephfs.keyring
查看輸出的ceph.client.cephfs.keyring密鑰文件,或使用下面的命令查看密鑰:
ceph auth get-key client.cephfs
掛載cephfs到各節(jié)點(diǎn)本地目錄
在各個(gè)節(jié)點(diǎn)執(zhí)行:
mkdir /mnt/cephfs/
mount -t ceph ceph1:6789,ceph2:6789,ceph3:6789:/ /mnt/cephfs/ -o name=cephfs,secret=<cephfs訪問用戶的密鑰>
編輯各個(gè)節(jié)點(diǎn)的/etc/fstab文件,實(shí)現(xiàn)開機(jī)自動(dòng)掛載,添加以下內(nèi)容:
ceph1:6789,ceph2:6789,ceph3:6789:/ /mnt/cephfs ceph name=cephfs,secretfile=<cephfs訪問用戶的密鑰>,noatime,_netdev 0 2
6.7 部署 iSCSI 網(wǎng)關(guān)
6.7.1 創(chuàng)建池
ceph osd pool create iscsi_pool
ceph osd pool application enable iscsi_pool rbd
6.7.2 部署 iSCSI 網(wǎng)關(guān)
ceph orch apply iscsi iscsi_pool admin admin --placement="1 ceph1"
6.7.3 列出主機(jī)和進(jìn)程
ceph orch ps --daemon_type=iscsi
執(zhí)行結(jié)果如下:
[ceph: root@ceph1 /]# ceph orch ps --daemon_type=iscsi
NAME HOST STATUS REFRESHED AGE VERSION IMAGE NAME IMAGE ID CONTAINER ID
iscsi.iscsi.ceph1.dszfku ceph1 running (11s) 2s ago 12s 3.5 quay.io/ceph/ceph:v15 93146564743f b4395ebd49b6
6.7.4 刪除iSCSI網(wǎng)關(guān)
查看列表
[ceph: root@ceph1 /]# ceph orch ls
NAME RUNNING REFRESHED AGE PLACEMENT IMAGE NAME IMAGE ID
alertmanager 1/1 4m ago 5h count:1 quay.io/prometheus/alertmanager:v0.20.0 0881eb8f169f
crash 3/3 12m ago 5h * quay.io/ceph/ceph:v15 93146564743f
grafana 1/1 4m ago 5h count:1 quay.io/ceph/ceph-grafana:6.7.4 557c83e11646
iscsi.iscsi 1/1 4m ago 4m ceph1;count:1 quay.io/ceph/ceph:v15 93146564743f
mds.cephfs 3/3 12m ago 4h ceph1;ceph2;ceph3;count:3 quay.io/ceph/ceph:v15 93146564743f
mgr 3/3 12m ago 5h ceph1;ceph2;ceph3;count:3 quay.io/ceph/ceph:v15 93146564743f
mon 3/3 12m ago 23h ceph1;ceph2;ceph3;count:3 quay.io/ceph/ceph:v15 93146564743f
node-exporter 3/3 12m ago 5h * quay.io/prometheus/node-exporter:v0.18.1 e5a616e4b9cf
osd.None 9/0 12m ago - <unmanaged> quay.io/ceph/ceph:v15 93146564743f
prometheus 1/1 4m ago 5h count:1 quay.io/prometheus/prometheus:v2.18.1 de242295e225
rgw.radowgw.cn 3/3 12m ago 5h ceph1;ceph2;ceph3;count:3 quay.io/ceph/ceph:v15 93146564743f
刪除iscsi網(wǎng)關(guān)
ceph orch rm iscsi.iscsi
7. 問題記錄:
7.1 文件系統(tǒng)啟動(dòng)不了
執(zhí)行如下命令ceph mds stat和ceph health detail查看文件系統(tǒng)狀態(tài),結(jié)果如下:
HEALTH_ERR 1 filesystem is offline; 1 filesystem is online with fewer MDS than max_mds
[ERR] MDS_ALL_DOWN: 1 filesystem is offline
fs cephfs is offline because no MDS is active for it.
[WRN] MDS_UP_LESS_THAN_MAX: 1 filesystem is online with fewer MDS than max_mds
fs cephfs has 0 MDS online, but wants 1
通過ceph orch ls查看mds是否部署或者成功啟動(dòng),若沒有部署則通過以下命令部署:
ceph orch apply mds cephfs --placement="1 ceph1"
7.2 文件系統(tǒng)掛載報(bào)錯(cuò)mount error 2 = No such file or directory
查看命令ceph mds stat發(fā)現(xiàn)文件系統(tǒng)創(chuàng)建中
cephfs:1 {0=cephfs.ceph1.npyywh=up:creating}
通過命令ceph -s查看,結(jié)果如下:
cluster:
id: dd6e5410-221f-11ee-b47c-000c29fd771a
health: HEALTH_WARN
1 MDSs report slow metadata IOs
Reduced data availability: 64 pgs inactive
Degraded data redundancy: 65 pgs undersized
services:
mon: 1 daemons, quorum ceph1 (age 15h)
mgr: ceph1.vtottx(active, since 15h)
mds: cephfs:1 {0=cephfs.ceph1.npyywh=up:creating}
osd: 3 osds: 3 up (since 15h), 3 in (since 15h); 1 remapped pgs
data:
pools: 3 pools, 65 pgs
objects: 0 objects, 0 B
usage: 3.0 GiB used, 57 GiB / 60 GiB avail
pgs: 98.462% pgs not active
64 undersized+peered
1 active+undersized+remapped
發(fā)現(xiàn)如下問題
1 MDSs report slow metadata IOs
Reduced data availability: 64 pgs inactive
Degraded data redundancy: 65 pgs undersized
由于單個(gè)節(jié)點(diǎn)集群,pg不足,滿足不了三副本的要求,需要設(shè)置mds為1,故將cephfs_metadata和cephfs_data的size設(shè)置為1,其他資源池一樣
ceph fs set cephfs max_mds 1
ceph osd pool set cephfs_metadata size 1
ceph osd pool set cephfs_data size 1
通過命令ceph mds stat 查詢結(jié)果如下:
cephfs:1 {0=cephfs.ceph1.npyywh=up:active}
狀態(tài)為active,可以正常掛載
7.3 /var/log/tallylog is either world writable or not a normal file
安裝軟件包c(diǎn)ephadm-15.2.17-0.el7.noarch.rpm,報(bào)如下錯(cuò)誤:
pam_tally2: /var/log/tallylog is either world writable or not a normal file
pam_tally2: Authentication error
執(zhí)行如下命令解決
chmod 600 /var/log/tallylog
pam_tally2 --user root --reset
7.4 RuntimeError: uid/gid not found
執(zhí)行cephadm shell,報(bào)如下錯(cuò)誤:
Inferring fsid c8fb20bc-247c-11ee-a39c-000c29aa07d2
Inferring config /var/lib/ceph/c8fb20bc-247c-11ee-a39c-000c29aa07d2/mon.node1/config
Using recent ceph image quay.io/ceph/ceph@<none>
Non-zero exit code 125 from /usr/bin/docker run --rm --ipc=host --net=host --entrypoint stat -e CONTAINER_IMAGE=quay.io/ceph/ceph@<none> -e NODE_NAME=node1 quay.io/ceph/ceph@<none> -c %u %g /var/lib/ceph
stat: stderr /usr/bin/docker: invalid reference format.
stat: stderr See '/usr/bin/docker run --help'.
Traceback (most recent call last):
File "/usr/sbin/cephadm", line 6250, in <module>
r = args.func()
File "/usr/sbin/cephadm", line 1381, in _infer_fsid
return func()
File "/usr/sbin/cephadm", line 1412, in _infer_config
return func()
File "/usr/sbin/cephadm", line 1440, in _infer_image
return func()
File "/usr/sbin/cephadm", line 3573, in command_shell
make_log_dir(args.fsid)
File "/usr/sbin/cephadm", line 1538, in make_log_dir
uid, gid = extract_uid_gid()
File "/usr/sbin/cephadm", line 2155, in extract_uid_gid
raise RuntimeError('uid/gid not found')
RuntimeError: uid/gid not found
經(jīng)過分析,確認(rèn) “cephadmin shell” 命令會(huì)啟動(dòng)一個(gè) ceph容器;因安裝在離線環(huán)境中,通過docker load的鏡像,丟失了 “RepoDigest” 信息,所有無法啟動(dòng)容器。
解決辦法是使用docker registry 創(chuàng)建本地倉(cāng)庫,然后推送鏡像到倉(cāng)庫,再拉取該鏡像來完善 “RepoDigest” 信息。(所有主機(jī)),同時(shí)主機(jī)名的解析也需要放到第一位,以免解析不到。
7.5 iptables: No chain/target/match by that name.
執(zhí)行docker restart registry時(shí)候報(bào)如下錯(cuò)誤
Error response from daemon: Cannot restart container registry: driver failed programming external connectivity on endpoint registry (dd9a9c15a451daa6abd2b85e840d7856c5c5f98c1fb1ae35897d3fbb28e2997c): (iptables failed: iptables --wait -t nat -A DOCKER -p tcp -d 0/0 --dport 4000 -j DNAT --to-destination 172.17.0.2:5000 ! -i docker0: iptables: No chain/target/match by that name.
(exit status 1))
在麒麟系統(tǒng)里面使用了firewalld代替了iptables,在運(yùn)行容器前,firewalld是運(yùn)行的,iptables會(huì)被使用,后來關(guān)閉了firewalld導(dǎo)致無法找到iptables相關(guān)信息,解決辦法是重啟docker更新容器,執(zhí)行如下命令:
systemctl restart docker
7.6 /usr/bin/ceph: timeout after 60 seconds
執(zhí)行cephadm bootstrap
錯(cuò)誤如下:
cephadm bootstrap --mon-ip 172.25.0.141
Verifying podman|docker is present...
Verifying lvm2 is present...
Verifying time synchronization is in place...
Unit chronyd.service is enabled and running
Repeating the final host check...
podman|docker (/usr/bin/docker) is present
systemctl is present
lvcreate is present
Unit chronyd.service is enabled and running
Host looks OK
Cluster fsid: 65ae4594-2d2c-11ee-b17b-000c29aa07d2
Verifying IP 172.25.0.141 port 3300 ...
Verifying IP 172.25.0.141 port 6789 ...
Mon IP 172.25.0.141 is in CIDR network 172.25.0.0/24
Pulling container image quay.io/ceph/ceph:v15...
Extracting ceph user uid/gid from container image...
Creating initial keys...
Creating initial monmap...
Creating mon...
Waiting for mon to start...
Waiting for mon...
/usr/bin/ceph: timeout after 60 seconds
Non-zero exit code -9 from /usr/bin/docker run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=quay.io/ceph/ceph:v15 -e NODE_NAME=ceph1 -v /var/lib/ceph/65ae4594-2d2c-11ee-b17b-000c29aa07d2/mon.ceph1:/var/lib/ceph/mon/ceph-ceph1:z -v /tmp/ceph-tmpuylyqrau:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp85t1j6sg:/etc/ceph/ceph.conf:z quay.io/ceph/ceph:v15 status
mon not available, waiting (1/10)...
/usr/bin/ceph: timeout after 60 seconds
Non-zero exit code -9 from /usr/bin/docker run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=quay.io/ceph/ceph:v15 -e NODE_NAME=ceph1 -v /var/lib/ceph/65ae4594-2d2c-11ee-b17b-000c29aa07d2/mon.ceph1:/var/lib/ceph/mon/ceph-ceph1:z -v /tmp/ceph-tmpuylyqrau:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp85t1j6sg:/etc/ceph/ceph.conf:z quay.io/ceph/ceph:v15 status
mon not available, waiting (2/10)...
/usr/bin/ceph: timeout after 60 seconds
Non-zero exit code -9 from /usr/bin/docker run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=quay.io/ceph/ceph:v15 -e NODE_NAME=ceph1 -v /var/lib/ceph/65ae4594-2d2c-11ee-b17b-000c29aa07d2/mon.ceph1:/var/lib/ceph/mon/ceph-ceph1:z -v /tmp/ceph-tmpuylyqrau:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp85t1j6sg:/etc/ceph/ceph.conf:z quay.io/ceph/ceph:v15 status
mon not available, waiting (3/10)...
/usr/bin/ceph: timeout after 60 seconds
Non-zero exit code -9 from /usr/bin/docker run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=quay.io/ceph/ceph:v15 -e NODE_NAME=ceph1 -v /var/lib/ceph/65ae4594-2d2c-11ee-b17b-000c29aa07d2/mon.ceph1:/var/lib/ceph/mon/ceph-ceph1:z -v /tmp/ceph-tmpuylyqrau:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp85t1j6sg:/etc/ceph/ceph.conf:z quay.io/ceph/ceph:v15 status
mon not available, waiting (4/10)...
/usr/bin/ceph: timeout after 60 seconds
Non-zero exit code -9 from /usr/bin/docker run --rm --ipc=host --net=host --entrypoint /usr/bin/ceph -e CONTAINER_IMAGE=quay.io/ceph/ceph:v15 -e NODE_NAME=ceph1 -v /var/lib/ceph/65ae4594-2d2c-11ee-b17b-000c29aa07d2/mon.ceph1:/var/lib/ceph/mon/ceph-ceph1:z -v /tmp/ceph-tmpuylyqrau:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp85t1j6sg:/etc/ceph/ceph.conf:z quay.io/ceph/ceph:v15 status
mon not available, waiting (5/10)...
然后手動(dòng)執(zhí)行
docker run -it --ipc=host --net=host -e CONTAINER_IMAGE=quay.io/ceph/ceph:v15 -e NODE_NAME=ceph1 -v /var/lib/ceph/65ae4594-2d2c-11ee-b17b-000c29aa07d2/mon.ceph1:/var/lib/ceph/mon/ceph-ceph1:z -v /tmp/ceph-tmpuylyqrau:/etc/ceph/ceph.client.admin.keyring:z -v /tmp/ceph-tmp85t1j6sg:/etc/ceph/ceph.conf:z quay.io/ceph/ceph:v15 bash
bash進(jìn)去后執(zhí)行ceph -s,發(fā)現(xiàn)如下錯(cuò)誤
[root@ceph1 /]# ceph -s
2023-07-28T10:01:34.083+0000 7f0dc4f41700 0 monclient(hunting): authenticate timed out after 300
2023-07-28T10:06:34.087+0000 7f0dc4f41700 0 monclient(hunting): authenticate timed out after 300
2023-07-28T10:11:34.087+0000 7f0dc4f41700 0 monclient(hunting): authenticate timed out after 300
查閱相關(guān)資料,發(fā)現(xiàn)跟DNS解析有關(guān),修改/etc/resolv.conf,加入一個(gè)DNS SERVER
vim /etc/resolv.conf
nameserver 223.5.5.5
問題得到解決。
7.7 ERROR: Cannot infer an fsid, one must be specified
執(zhí)行ceph -s,報(bào)如下錯(cuò)誤:
ERROR: Cannot infer an fsid, one must be specified: ['65ae4594-2d2c-11ee-b17b-000c29aa07d2', '70eb8f7c-2d2f-11ee- 8265-000c29aa07d2', 'd20e3700-2d2f-11ee-9166-000c29aa07d2']
由于多次部署失敗導(dǎo)致的,將失敗的fsid刪除即可,操作如下:
cephadm rm-cluster --fsid 65ae4594-2d2c-11ee-b17b-000c29aa07d2 --force
cephadm rm-cluster --fsid 70eb8f7c-2d2f-11ee-8265-000c29aa07d2 --force
7.8 2 failed cephadm daemon(s),daemon node-exporter.ceph2 on ceph2 is in error state
執(zhí)行ceph -s,報(bào)信息2 failed cephadm daemon(s),如下
[ceph: root@ceph1 /]# ceph -s
cluster:
id: d20e3700-2d2f-11ee-9166-000c29aa07d2
health: HEALTH_WARN
2 failed cephadm daemon(s)
Degraded data redundancy: 1 pg undersized
services:
mon: 3 daemons, quorum ceph1,ceph2,ceph3 (age 9m)
mgr: ceph1.kxhbab(active, since 6h), standbys: ceph3.aseinm
osd: 3 osds: 3 up (since 42m), 3 in (since 42m); 1 remapped pgs
data:
pools: 1 pools, 1 pgs
objects: 0 objects, 0 B
usage: 3.0 GiB used, 57 GiB / 60 GiB avail
pgs: 1 active+undersized+remapped
通過ceph health detail查看
HEALTH_WARN 2 failed cephadm daemon(s); Degraded data redundancy: 1 pg undersized
[WRN] CEPHADM_FAILED_DAEMON: 2 failed cephadm daemon(s)
daemon node-exporter.ceph2 on ceph2 is in error state
daemon node-exporter.ceph3 on ceph3 is in error state
[WRN] PG_DEGRADED: Degraded data redundancy: 1 pg undersized
pg 1.0 is stuck undersized for 44m, current state active+undersized+remapped, last acting [1,0]
發(fā)現(xiàn)是ceph2、ceph3上node-exporter有問題,再次通過ceph orch ps查看,如下
[ceph: root@ceph1 /]# ceph orch ps
NAME HOST STATUS REFRESHED AGE VERSION IMAGE NAME IMAGE ID CONTAINER ID
alertmanager.ceph1 ceph1 running (17m) 6m ago 6h 0.20.0 quay.io/prometheus/alertmanager:v0.20.0 0881eb8f169f f640d60309a8
crash.ceph1 ceph1 running (6h) 6m ago 6h 15.2.17 quay.io/ceph/ceph:v15 93146564743f 167923df16d6
crash.ceph2 ceph2 running (45m) 6m ago 45m 15.2.17 quay.io/ceph/ceph:v15 93146564743f 36930ffad980
crash.ceph3 ceph3 running (7h) 6m ago 7h 15.2.17 quay.io/ceph/ceph:v15 93146564743f 3cf33326be8f
grafana.ceph1 ceph1 running (66m) 6m ago 6h 6.7.4 quay.io/ceph/ceph-grafana:6.7.4 557c83e11646 a9f1cd6dd382
mgr.ceph1.kxhbab ceph1 running (6h) 6m ago 7h 15.2.17 quay.io/ceph/ceph:v15 93146564743f c738e894f955
mgr.ceph3.aseinm ceph3 running (7h) 6m ago 7h 15.2.17 quay.io/ceph/ceph:v15 93146564743f 2657cef61946
mon.ceph1 ceph1 running (6h) 6m ago 7h 15.2.17 quay.io/ceph/ceph:v15 93146564743f ac8d2bf766d9
mon.ceph2 ceph2 running (45m) 6m ago 45m 15.2.17 quay.io/ceph/ceph:v15 93146564743f 30fc64339e92
mon.ceph3 ceph3 running (7h) 6m ago 7h 15.2.17 quay.io/ceph/ceph:v15 93146564743f b98335e1b0e1
node-exporter.ceph1 ceph1 running (66m) 6m ago 6h 0.18.1 quay.io/prometheus/node-exporter:v0.18.1 e5a616e4b9cf 7bb7cc89e4bf
node-exporter.ceph2 ceph2 error 6m ago 7h <unknown> quay.io/prometheus/node-exporter:v0.18.1 <unknown> <unknown>
node-exporter.ceph3 ceph3 error 6m ago 7h <unknown> quay.io/prometheus/node-exporter:v0.18.1 <unknown> <unknown>
osd.0 ceph1 running (45m) 6m ago 45m 15.2.17 quay.io/ceph/ceph:v15 93146564743f 56418e7d64a2
osd.1 ceph1 running (45m) 6m ago 45m 15.2.17 quay.io/ceph/ceph:v15 93146564743f 04be93b74209
osd.2 ceph1 running (45m) 6m ago 45m 15.2.17 quay.io/ceph/ceph:v15 93146564743f 15ca4a2a1acc
prometheus.ceph1 ceph1 running (17m) 6m ago 6h 2.18.1 quay.io/prometheus/prometheus:v2.18.1 de242295e225 8dfaf3f5b9c2
ceph2、ceph3上node-exporter狀態(tài)為error,將該服務(wù)重啟
ceph orch daemon restart node-exporter.ceph2
ceph orch daemon restart node-exporter.ceph3
重啟之后正常,如下:
[ceph: root@ceph1 /]# ceph -s
cluster:
id: d20e3700-2d2f-11ee-9166-000c29aa07d2
health: HEALTH_WARN
Degraded data redundancy: 1 pg undersized
services:
mon: 3 daemons, quorum ceph1,ceph2,ceph3 (age 22m)
mgr: ceph1.kxhbab(active, since 6h), standbys: ceph3.aseinm
osd: 3 osds: 3 up (since 56m), 3 in (since 56m); 1 remapped pgs
data:
pools: 1 pools, 1 pgs
objects: 0 objects, 0 B
usage: 3.0 GiB used, 57 GiB / 60 GiB avail
pgs: 1 active+undersized+remapped
7.8 1 failed cephadm daemon(s),daemon node-exporter.ceph2 on ceph2 is in error state
執(zhí)行ceph -s,報(bào)信息1 failed cephadm daemon(s),如下
[ceph: root@ceph1 /]# ceph -s
cluster:
id: d20e3700-2d2f-11ee-9166-000c29aa07d2
health: HEALTH_WARN
1 failed cephadm daemon(s)
services:
mon: 3 daemons, quorum ceph1,ceph2,ceph3 (age 49m)
mgr: ceph1.kxhbab(active, since 6h), standbys: ceph3.aseinm
osd: 9 osds: 8 up (since 116s), 8 in (since 116s)
data:
pools: 1 pools, 1 pgs
objects: 0 objects, 0 B
usage: 8.0 GiB used, 152 GiB / 160 GiB avail
pgs: 1 active+clean
通過ceph health detail查看
[ceph: root@ceph1 /]# ceph health detail
HEALTH_WARN 1 failed cephadm daemon(s)
[WRN] CEPHADM_FAILED_DAEMON: 1 failed cephadm daemon(s)
daemon osd.3 on ceph2 is in unknown state
發(fā)現(xiàn)是ceph2上osd.3有問題,再次通過ceph orch ps查看,如下
[ceph: root@ceph1 /]# ceph orch ps
NAME HOST STATUS REFRESHED AGE VERSION IMAGE NAME IMAGE ID CONTAINER ID
alertmanager.ceph1 ceph1 running (53m) 2m ago 7h 0.20.0 quay.io/prometheus/alertmanager:v0.20.0 0881eb8f169f f640d60309a8
crash.ceph1 ceph1 running (6h) 2m ago 7h 15.2.17 quay.io/ceph/ceph:v15 93146564743f 167923df16d6
crash.ceph2 ceph2 running (82m) 77s ago 82m 15.2.17 quay.io/ceph/ceph:v15 93146564743f 36930ffad980
crash.ceph3 ceph3 running (7h) 31s ago 7h 15.2.17 quay.io/ceph/ceph:v15 93146564743f 3cf33326be8f
grafana.ceph1 ceph1 running (103m) 2m ago 7h 6.7.4 quay.io/ceph/ceph-grafana:6.7.4 557c83e11646 a9f1cd6dd382
mgr.ceph1.kxhbab ceph1 running (6h) 2m ago 7h 15.2.17 quay.io/ceph/ceph:v15 93146564743f c738e894f955
mgr.ceph3.aseinm ceph3 running (7h) 31s ago 7h 15.2.17 quay.io/ceph/ceph:v15 93146564743f 2657cef61946
mon.ceph1 ceph1 running (6h) 2m ago 7h 15.2.17 quay.io/ceph/ceph:v15 93146564743f ac8d2bf766d9
mon.ceph2 ceph2 running (82m) 77s ago 82m 15.2.17 quay.io/ceph/ceph:v15 93146564743f 30fc64339e92
mon.ceph3 ceph3 running (7h) 31s ago 7h 15.2.17 quay.io/ceph/ceph:v15 93146564743f b98335e1b0e1
node-exporter.ceph1 ceph1 running (103m) 2m ago 7h 0.18.1 quay.io/prometheus/node-exporter:v0.18.1 e5a616e4b9cf 7bb7cc89e4bf
node-exporter.ceph2 ceph2 running (34m) 77s ago 8h 0.18.1 quay.io/prometheus/node-exporter:v0.18.1 e5a616e4b9cf 9cde6dd53b22
node-exporter.ceph3 ceph3 running (33m) 31s ago 8h 0.18.1 quay.io/prometheus/node-exporter:v0.18.1 e5a616e4b9cf 106d248979bc
osd.0 ceph1 running (82m) 2m ago 82m 15.2.17 quay.io/ceph/ceph:v15 93146564743f 56418e7d64a2
osd.1 ceph1 running (81m) 2m ago 81m 15.2.17 quay.io/ceph/ceph:v15 93146564743f 04be93b74209
osd.2 ceph1 running (81m) 2m ago 81m 15.2.17 quay.io/ceph/ceph:v15 93146564743f 15ca4a2a1acc
osd.3 ceph2 unknown 77s ago 2m <unknown> quay.io/ceph/ceph:v15 <unknown> <unknown>
osd.4 ceph2 running (98s) 77s ago 100s 15.2.17 quay.io/ceph/ceph:v15 93146564743f 02800b38df89
osd.5 ceph2 running (82s) 77s ago 84s 15.2.17 quay.io/ceph/ceph:v15 93146564743f 05ce4a9c9588
osd.6 ceph3 running (60s) 31s ago 61s 15.2.17 quay.io/ceph/ceph:v15 93146564743f 0d22f79c41ab
osd.7 ceph3 running (45s) 31s ago 47s 15.2.17 quay.io/ceph/ceph:v15 93146564743f a80a852550e3
osd.8 ceph3 running (32s) 31s ago 33s 15.2.17 quay.io/ceph/ceph:v15 93146564743f 51ebec72bb3f
prometheus.ceph1 ceph1 running (53m) 2m ago 7h 2.18.1 quay.io/prometheus/prometheus:v2.18.1 de242295e225 8dfaf3f5b9c2
ceph2上osd.3狀態(tài)為unknown
ceph osd tree查看下osd狀態(tài)
[ceph: root@ceph1 /]# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 0.15588 root default
-3 0.05846 host ceph1
0 hdd 0.01949 osd.0 up 1.00000 1.00000
1 hdd 0.01949 osd.1 up 1.00000 1.00000
2 hdd 0.01949 osd.2 up 1.00000 1.00000
-5 0.03897 host ceph2
4 hdd 0.01949 osd.4 up 1.00000 1.00000
5 hdd 0.01949 osd.5 up 1.00000 1.00000
-7 0.05846 host ceph3
6 hdd 0.01949 osd.6 up 1.00000 1.00000
7 hdd 0.01949 osd.7 up 1.00000 1.00000
8 hdd 0.01949 osd.8 up 1.00000 1.00000
3 0 osd.3 down 0 1.00000
發(fā)現(xiàn)osd.3的狀態(tài)為down,針對(duì)服務(wù)做一次重啟
ceph orch daemon restart osd.3
等待一端時(shí)間后發(fā)現(xiàn)ceph orch ps獲取的osd.3的狀態(tài)為error
采取刪除重新部署后,登錄到ceph2,將磁盤重新初始化,用于滿足OSD部署,如下
dmsetup rm ceph--89a94f3b--e5ef--4ec9--b828--d86ea84d6540-osd--block--e4a8e9c8--20e6--4847--8176--510533616844
vgremove ceph-89a94f3b-e5ef-4ec9-b828-d86ea84d6540
pvremove /dev/sdb
mkfs.xfs -f /dev/sdb
在ceph1上刪除重新添加
ceph osd rm 3
ceph orch daemon rm osd.3 --force
ceph orch daemon add osd ceph2:/dev/sdb
重新部署后正常,估計(jì)一開始磁盤遇到些問題,如下:文章來源:http://www.zghlxwxcb.cn/news/detail-624176.html
[ceph: root@ceph1 /]# ceph -s
cluster:
id: d20e3700-2d2f-11ee-9166-000c29aa07d2
health: HEALTH_OK
services:
mon: 3 daemons, quorum ceph1,ceph2,ceph3 (age 16h)
mgr: ceph1.kxhbab(active, since 22h), standbys: ceph3.aseinm
osd: 9 osds: 9 up (since 15h), 9 in (since 15h)
data:
pools: 1 pools, 1 pgs
objects: 0 objects, 0 B
usage: 9.1 GiB used, 171 GiB / 180 GiB avail
pgs: 1 active+clean
8. 參考文獻(xiàn):
https://docs.ceph.com/en/latest/cephadm/
https://docs.ceph.com/en/latest/cephfs/
https://blog.csdn.net/DoloresOOO/article/details/106855093
https://blog.csdn.net/XU_sun_/article/details/119909860
https://blog.csdn.net/networken/article/details/106870859
https://zhuanlan.zhihu.com/p/598832268
https://www.jianshu.com/p/b2aab379d7ec
https://blog.csdn.net/JineD/article/details/113886368
http://dbaselife.com/doc/752/
http://www.chenlianfu.com/?p=3388
http://mirrors.163.com/ceph/rpm-octopus/el7/noarch/
https://blog.csdn.net/qq_27979109/article/details/120345676#3mondocker_runcephconf_718
https://cloud-atlas.readthedocs.io/zh_CN/latest/ceph/deploy/install_mobile_cloud_ceph/debug_ceph_authenticate_time_out.html
https://access.redhat.com/documentation/zh-cn/red_hat_ceph_storage/5/html/operations_guide/introduction-to-the-ceph-orchestrator文章來源地址http://www.zghlxwxcb.cn/news/detail-624176.html
到了這里,關(guān)于Kylin v10基于cephadm工具離線部署ceph分布式存儲(chǔ)的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!