Crane 是一個(gè)基于 FinOps 的云資源分析與成本優(yōu)化平臺(tái)。在保證客戶應(yīng)用運(yùn)行質(zhì)量的前提下實(shí)現(xiàn)極致的降本。
一、 前言??
云原生技術(shù)可以幫助企業(yè)實(shí)現(xiàn)降本增效,提高業(yè)務(wù)的靈活性和可擴(kuò)展性。云原生技術(shù)的降本增效主要是由以下因素推動(dòng)的:
- 成本壓力:隨著業(yè)務(wù)規(guī)模和數(shù)據(jù)量的不斷增長(zhǎng),傳統(tǒng)基礎(chǔ)設(shè)施(如物理服務(wù)器和虛擬機(jī))的成本和管理復(fù)雜度不斷上升,對(duì)企業(yè)的成本壓力越來越大。
- 業(yè)務(wù)需求:企業(yè)面對(duì)的業(yè)務(wù)需求越來越多樣化和復(fù)雜化,需要快速、靈活地部署和管理應(yīng)用程序,以滿足市場(chǎng)需求,傳統(tǒng)的基礎(chǔ)設(shè)施無法滿足這些需求。
- 技術(shù)發(fā)展:隨著云計(jì)算、大數(shù)據(jù)、人工智能等新技術(shù)的發(fā)展,企業(yè)需要更加智能、高效的IT基礎(chǔ)設(shè)施來支持業(yè)務(wù)創(chuàng)新和發(fā)展。
二、 Crane開源項(xiàng)目簡(jiǎn)介??
? Crane是由騰訊云主導(dǎo)開源的國(guó)內(nèi)第一個(gè)基于云原生技術(shù)的成本優(yōu)化項(xiàng)目,遵循FinOps標(biāo)準(zhǔn),已經(jīng)獲得FinOps基金會(huì)授予的全球首個(gè)認(rèn)證降本增效開源方案。它為使用Kubernetes集群的企業(yè)提供了一種簡(jiǎn)單、可靠且強(qiáng)大的自動(dòng)化部署工具。Crane的設(shè)計(jì)初衷是為了幫助企業(yè)更好地管理和擴(kuò)展其 Kubernetes 集群,從而實(shí)現(xiàn)更高效的云原生應(yīng)用管理。它易于使用、高度可定制和可擴(kuò)展。它提供了一組簡(jiǎn)單易用的命令行工具,使得開發(fā)者和管理員都能輕松地將應(yīng)用程序部署到 Kubernetes 集群中。Crane 還支持多種云平臺(tái),并且可以根據(jù)具體的業(yè)務(wù)需求進(jìn)行定制。Crane并已經(jīng)被騰訊、網(wǎng)易、思必馳、酷家樂、明源云、數(shù)數(shù)科技等公司部署在生產(chǎn)系統(tǒng),其主要貢獻(xiàn)者來自騰訊、小紅書、谷歌、eBay、微軟、特斯拉等知名公司。
2.1. Crane整體框架??
Craned 是 Crane 的最核心組件,它管理了 CRDs 的生命周期以及API。Craned 通過 Deployment
方式部署且由兩個(gè)容器組成:
-
Craned
: 運(yùn)行了 Operators 用來管理 CRDs,向 Dashboard 提供了 WebApi,Predictors 提供了 TimeSeries API -
Dashboard
: 基于 TDesign’s Starter 腳手架研發(fā)的前端項(xiàng)目,提供了易于上手的產(chǎn)品功能
① Fadvisor
Fadvisor 提供一組 Exporter 計(jì)算集群云資源的計(jì)費(fèi)和賬單數(shù)據(jù)并存儲(chǔ)到你的監(jiān)控系統(tǒng),比如 Prometheus。Fadvisor 通過 Cloud Provider
支持了多云計(jì)費(fèi)的 API。
② Metric Adapter
Metric Adapter 實(shí)現(xiàn)了一個(gè) Custom Metric Apiserver
. Metric Adapter 讀取 CRDs 信息并提供基于 Custom/External Metric API
的 HPA Metric 的數(shù)據(jù)。
③ Crane Agent
Crane Agent 通過 DaemonSet
部署在集群的節(jié)點(diǎn)上。
2.2. Crane主要功能??
??成本可視化和優(yōu)化評(píng)估
- 提供一組 Exporter 計(jì)算集群云資源的計(jì)費(fèi)和賬單數(shù)據(jù)并存儲(chǔ)到你的監(jiān)控系統(tǒng),比如 Prometheus。
- 多維度的成本洞察,優(yōu)化評(píng)估。通過
Cloud Provider
支持多云計(jì)費(fèi)。
??推薦框架
提供了一個(gè)可擴(kuò)展的推薦框架以支持多種云資源的分析,內(nèi)置了多種推薦器:資源推薦,副本推薦,HPA 推薦,閑置資源推薦。
??基于預(yù)測(cè)的水平彈性器
EffectiveHorizontalPodAutoscaler 支持了預(yù)測(cè)驅(qū)動(dòng)的彈性。它基于社區(qū) HPA 做底層的彈性控制,支持更豐富的彈性觸發(fā)策略(預(yù)測(cè),觀測(cè),周期),讓彈性更加高效,并保障了服務(wù)的質(zhì)量。
??負(fù)載感知的調(diào)度器
動(dòng)態(tài)調(diào)度器根據(jù)實(shí)際的節(jié)點(diǎn)利用率構(gòu)建了一個(gè)簡(jiǎn)單但高效的模型,并過濾掉那些負(fù)載高的節(jié)點(diǎn)來平衡集群。
??拓?fù)涓兄恼{(diào)度器
Crane Scheduler與Crane Agent配合工作,支持更為精細(xì)化的資源拓?fù)涓兄{(diào)度和多種綁核策略,可解決復(fù)雜場(chǎng)景下“吵鬧的鄰居問題",使得資源得到更合理高效的利用。
??基于 QOS 的混部
QOS相關(guān)能力保證了運(yùn)行在 Kubernetes 上的 Pod 的穩(wěn)定性。具有多維指標(biāo)條件下的干擾檢測(cè)和主動(dòng)回避能力,支持精確操作和自定義指標(biāo)接入;具有預(yù)測(cè)算法增強(qiáng)的彈性資源超賣能力,復(fù)用和限制集群內(nèi)的空閑資源;具備增強(qiáng)的旁路cpuset管理能力,在綁核的同時(shí)提升資源利用效率。
三、Crane實(shí)驗(yàn)前期準(zhǔn)備??
采用
VMware Workstation
虛擬化軟件,基于Rocky Linux開源企業(yè)級(jí)系統(tǒng)
安裝部署單機(jī)版的集群系統(tǒng)即可完成Crane開源項(xiàng)目。
本實(shí)驗(yàn)環(huán)境配置說明?
系統(tǒng)版本 | 內(nèi)存大小 | 磁盤大小 | 網(wǎng)絡(luò)模式 | IP地址 |
---|---|---|---|---|
Rocky Linux release 8.7 | ≥8GB(推薦) | 30GB | NAT模式 | 192.168.200.60 |
本實(shí)驗(yàn)軟件版本說明??
必要組件 | 組件版本 |
---|---|
docker | v23.0.5 |
kubectl | v1.27.1 |
helm | v3.11.3 |
kind | v0.18.0 |
3.1. 系統(tǒng)初始化??
1、修改主機(jī)名
hostnamectl set-hostname Crane
2、關(guān)閉防火墻
systemctl stop firewalld && systemctl disable firewalld
systemctl status firewalld
3、關(guān)閉selinux
# 臨時(shí)允許
setenforce 0
getenforce
# 永久允許
sed -i "s/SELINUX=enforcing/SELINUX=permissive/g" /etc/selinux/config
cat /etc/selinux/config
4、關(guān)閉swap分區(qū)
# 查看swapoff的版本
swapoff --version
# 臨時(shí)關(guān)閉?
swapoff -a
# 永久關(guān)閉?
sed -ri 's/.*swap.*/#&/' /etc/fstab # 重啟生效
# 使用swapon檢查
swapon -v #輸出為空,表示swap已關(guān)閉
5、配置網(wǎng)卡信息
cat /etc/sysconfig/network-scripts/ifcfg-ens32
systemctl restart NetworkManager
nmcli connection up ens160
6、配置阿里云鏡像
sed -e 's|^mirrorlist=|#mirrorlist=|g' \
-e 's|^#baseurl=http://dl.rockylinux.org/$contentdir|baseurl=https://mirrors.aliyun.com/rockylinux|g' \
-i.bak \
/etc/yum.repos.d/Rocky-*.repo
dnf makecache
7、生成本地緩存
yum makecache fast
8、更新YUM源軟件包
yum update -y
9、重啟系統(tǒng)
reboot
3.2. Docker安裝??
1、使用yum安裝gcc相關(guān)環(huán)境
yum install -y gcc gcc-c++
2、安裝需要的依賴包
yum install -y yum-utils
3、設(shè)置阿里云docker鏡像
yum-config-manager \
--add-repo \
https://mirrors.aliyun.com/docker-ce/linux/centos/docker-ce.repo
4、安裝docker、docker-ce、ee企業(yè)版
yum install -y docker-ce docker-ce-cli containerd.io
5、啟動(dòng)Docker
systemctl start docker && systemctl enable docker && systemctl status docker
查看所安裝的docker版本信息,此時(shí)docker服務(wù)沒有啟動(dòng)。
[root@Crane ~]# docker version
Client: Docker Engine - Community
Version: 23.0.6
API version: 1.42
Go version: go1.19.9
Git commit: ef23cbc
Built: Fri May 5 21:19:08 2023
OS/Arch: linux/amd64
Context: default
Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
啟動(dòng)docker服務(wù)并設(shè)置docker服務(wù)開機(jī)自啟動(dòng),查看Docker服務(wù)狀態(tài)。
systemctl start docker && systemctl enable docker && systemctl status docker
查看docker版本信息
[root@Crane ~]# docker version
Client: Docker Engine - Community
Version: 23.0.6
API version: 1.42
Go version: go1.19.9
Git commit: ef23cbc
Built: Fri May 5 21:19:08 2023
OS/Arch: linux/amd64
Context: default
Server: Docker Engine - Community
Engine:
Version: 23.0.6
API version: 1.42 (minimum version 1.12)
Go version: go1.19.9
Git commit: 9dbdbd4
Built: Fri May 5 21:18:15 2023
OS/Arch: linux/amd64
Experimental: false
containerd:
Version: 1.6.21
GitCommit: 3dce8eb055cbb6872793272b4f20ed16117344f8
runc:
Version: 1.1.7
GitCommit: v1.1.7-0-g860f061
docker-init:
Version: 0.19.0
GitCommit: de40ad0
3.3. kubectl安裝??
??參考鏈接:在 Linux 系統(tǒng)中安裝并設(shè)置 kubectl | Kubernetes
用 curl 在 Linux 系統(tǒng)中安裝 kubectl
[root@Crane ~]# curl -LO "https://dl.k8s.io/release/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 138 100 138 0 0 276 0 --:--:-- --:--:-- --:--:-- 276
100 46.9M 100 46.9M 0 0 5556k 0 0:00:08 0:00:08 --:--:-- 6553k
[root@crane ~]# ll
total 48096
-rw-------. 1 root root 1322 Mar 29 2022 anaconda-ks.cfg
-rw-r--r-- 1 root root 49246208 May 7 11:21 kubectl
# 驗(yàn)證該可執(zhí)行文件:
1??下載 kubectl 校驗(yàn)和文件
[root@Crane ~]# curl -LO "https://dl.k8s.io/$(curl -L -s https://dl.k8s.io/release/stable.txt)/bin/linux/amd64/kubectl.sha256"
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 138 100 138 0 0 281 0 --:--:-- --:--:-- --:--:-- 280
100 64 100 64 0 0 65 0 --:--:-- --:--:-- --:--:-- 65
2??基于校驗(yàn)和文件,驗(yàn)證 kubectl 的可執(zhí)行文件
[root@Crane ~]# echo "$(cat kubectl.sha256) kubectl" | sha256sum --check
kubectl: OK
安裝 kubectl
# 安裝kubectl
[root@crane ~]# sudo install -o root -g root -m 0755 kubectl /usr/local/bin/kubectl
# 查看kubectl的安裝版本
[root@Crane ~]# kubectl version --client
WARNING: This version information is deprecated and will be replaced with the output from kubectl version --short. Use --output=yaml|json to get the full version.
Client Version: version.Info{Major:"1", Minor:"27", GitVersion:"v1.27.1", GitCommit:"4c9411232e10168d7b050c49a1b59f6df9d7ea4b", GitTreeState:"clean", BuildDate:"2023-04-14T13:21:19Z", GoVersion:"go1.20.3", Compiler:"gc", Platform:"linux/amd64"}
Kustomize Version: v5.0.1
# 查看kubectl版本的詳細(xì)信息
[root@Crane ~]# kubectl version --client --output=yaml
clientVersion:
buildDate: "2023-04-14T13:21:19Z"
compiler: gc
gitCommit: 4c9411232e10168d7b050c49a1b59f6df9d7ea4b
gitTreeState: clean
gitVersion: v1.27.1
goVersion: go1.20.3
major: "1"
minor: "27"
platform: linux/amd64
kustomizeVersion: v5.0.1
3.4. helm安裝??
?參考鏈接:Helm | 安裝Helm
# 獲取helm腳本,可能需要開啟代理。
[root@Crane ~]# curl -fsSL -o get_helm.sh https://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3
# 查看目錄
[root@Crane ~]# ll
total 48112
-rw-------. 1 root root 1322 Mar 29 2022 anaconda-ks.cfg
-rw-r--r--. 1 root root 11345 May 10 11:53 get_helm.sh
-rw-r--r--. 1 root root 49246208 May 10 11:48 kubectl
-rw-r--r--. 1 root root 64 May 10 11:49 kubectl.sha256
# 賦予該腳本可執(zhí)行的權(quán)限
[root@crane ~]# chmod 700 get_helm.sh
# 執(zhí)行該腳本安裝helm
[root@Crane ~]# ./get_helm.sh
Downloading https://get.helm.sh/helm-v3.11.3-linux-amd64.tar.gz
Verifying checksum... Done.
Preparing to install helm into /usr/local/bin
helm installed into /usr/local/bin/helm
# 查看helm版本信息
[root@Crane ~]# helm version
version.BuildInfo{Version:"v3.11.3", GitCommit:"323249351482b3bbfc9f5004f65d400aa70f9ae7", GitTreeState:"clean", GoVersion:"go1.20.3"}
3.5. kind安裝??
??參考鏈接:kind – Quick Start
# 獲取kind腳本
[root@Crane ~]# curl -Lo ./kind https://kind.sigs.k8s.io/dl/v0.18.0/kind-linux-amd64
% Total % Received % Xferd Average Speed Time Time Time Current
Dload Upload Total Spent Left Speed
100 98 100 98 0 0 81 0 0:00:01 0:00:01 --:--:-- 81
0 0 0 0 0 0 0 0 --:--:-- 0:00:02 --:--:-- 0
100 6808k 100 6808k 0 0 1064k 0 0:00:06 0:00:06 --:--:-- 2552k
# 賦予該腳本可執(zhí)行權(quán)限
[root@Crane ~]# chmod +x ./kind
# 轉(zhuǎn)移目錄
[root@Crane ~]# sudo mv ./kind /usr/local/bin/kind
# 查看kind版本信息
[root@Crane ~]# kind version
kind v0.18.0 go1.20.2 linux/amd64
四、單機(jī)版Crane部署流程??
4.1. Crane系統(tǒng)一鍵化安裝??
?? Plan A:執(zhí)行此命令直接一鍵化安裝部署
# 執(zhí)行此命令,可以一鍵部署,但是需要訪問外網(wǎng)。
curl -sf https://raw.githubusercontent.com/gocrane/crane/main/hack/local-env-setup.sh | sh -
安裝過程演示:
[root@Crane ~]# curl -sf https://raw.githubusercontent.com/gocrane/crane/main/hack/local-env-setup.sh | sh -
Step1: Create local cluster: /root/.kube/config_crane
Deleting cluster "crane" ...
Creating cluster "crane" ...
? Ensuring node image (kindest/node:v1.21.1) ??
? Preparing nodes ??
? Writing configuration ??
? Starting control-plane ???
? Installing CNI ??
? Installing StorageClass ??
Set kubectl context to "kind-crane"
You can now use your cluster with:
kubectl cluster-info --context kind-crane --kubeconfig /root/.kube/config_crane
Thanks for using kind! ??
Step1: Create local cluster finished.
Step2: Installing Prometheus
"prometheus-community" has been added to your repositories
NAME: prometheus
LAST DEPLOYED: Wed May 10 12:12:54 2023
NAMESPACE: crane-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NOTES:
The Prometheus server can be accessed via port 8080 on the following DNS name from within your cluster:
prometheus-server.crane-system.svc.cluster.local
Get the Prometheus server URL by running these commands in the same shell:
export POD_NAME=$(kubectl get pods --namespace crane-system -l "app=prometheus,component=server" -o jsonpath="{.items[0].metadata.name}")
kubectl --namespace crane-system port-forward $POD_NAME 9090
#################################################################################
###### WARNING: Persistence is disabled!!! You will lose your data when #####
###### the Server pod is terminated. #####
#################################################################################
#################################################################################
###### WARNING: Pod Security Policy has been disabled by default since #####
###### it deprecated after k8s 1.25+. use #####
###### (index .Values "prometheus-node-exporter" "rbac" #####
###### . "pspEnabled") with (index .Values #####
###### "prometheus-node-exporter" "rbac" "pspAnnotations") #####
###### in case you still need it. #####
#################################################################################
For more information on running Prometheus, visit:
https://prometheus.io/
Step2: Installing Prometheus finished.
Step3: Installing Grafana
NAME: grafana
LAST DEPLOYED: Wed May 10 12:13:00 2023
NAMESPACE: crane-system
STATUS: deployed
REVISION: 1
NOTES:
1. Get your 'admin' user password by running:
kubectl get secret --namespace crane-system grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo
2. The Grafana server can be accessed via port 8082 on the following DNS name from within your cluster:
grafana.crane-system.svc.cluster.local
Get the Grafana URL to visit by running these commands in the same shell:
export POD_NAME=$(kubectl get pods --namespace crane-system -l "app.kubernetes.io/name=grafana,app.kubernetes.io/instance=grafana" -o jsonpath="{.items[0].metadata.name}")
kubectl --namespace crane-system port-forward $POD_NAME 3000
3. Login with the password from step 1 and the username: admin
#################################################################################
###### WARNING: Persistence is disabled!!! You will lose your data when #####
###### the Grafana pod is terminated. #####
#################################################################################
Step3: Installing Grafana finished.
Step4: Installing Crane
"crane" has been added to your repositories
Hang tight while we grab the latest from your chart repositories...
...Successfully got an update from the "crane" chart repository
...Successfully got an update from the "grafana" chart repository
...Successfully got an update from the "prometheus-community" chart repository
Update Complete. ?Happy Helming!?
NAME: crane
LAST DEPLOYED: Wed May 10 12:13:06 2023
NAMESPACE: crane-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
NAME: fadvisor
LAST DEPLOYED: Wed May 10 12:13:10 2023
NAMESPACE: crane-system
STATUS: deployed
REVISION: 1
TEST SUITE: None
Step4: Installing Crane finished.
NAME READY UP-TO-DATE AVAILABLE AGE
craned 0/1 1 0 3s
fadvisor 0/1 1 0 0s
grafana 0/1 1 0 10s
metric-adapter 0/1 1 0 3s
prometheus-kube-state-metrics 0/1 1 0 16s
prometheus-server 0/1 1 0 16s
Please wait for all pods ready
After all pods ready, Get the Crane Dashboard URL to visit by running these commands in the same shell:
export KUBECONFIG=/root/.kube/config_crane
kubectl -n crane-system port-forward service/craned 9090:9090
??Plan B:如果訪問網(wǎng)絡(luò)發(fā)生錯(cuò)誤,可以使用本地安裝包執(zhí)行安裝操作,具體執(zhí)行命令如下:
# 上傳Crane安裝包到系統(tǒng)中
[root@Crane training]# pwd
/root/training
[root@Crane training]# ll
total 4
drwxr-xr-x. 7 root root 4096 May 10 13:50 installation
# 進(jìn)入training目錄,查看文件內(nèi)容
[root@Crane installation]# ll
total 228
-rw-r--r--. 1 root root 4206 May 10 13:50 components.yaml
drwxr-xr-x. 5 root root 120 May 10 13:50 crane
-rw-r--r--. 1 root root 1232 May 10 13:50 effective-hpa.yaml
drwxr-xr-x. 3 root root 77 May 10 13:50 fadvisor
drwxr-xr-x. 5 root root 124 May 10 13:50 grafana
-rw-r--r--. 1 root root 199848 May 10 13:50 grafana_override_values.yaml
drwxr-xr-x. 3 root root 96 May 10 13:50 kube-state-metrics
-rw-r--r--. 1 root root 3777 May 10 13:50 local-env-setup.sh
-rw-r--r--. 1 root root 522 May 10 13:50 nginx-deployment.yaml
-rw-r--r--. 1 root root 615 May 10 13:50 php-apache.yaml
drwxr-xr-x. 3 root root 140 May 10 13:50 prometheus
-rw-r--r--. 1 root root 4915 May 10 13:50 prometheus_override_values.yaml
# ??必須在 installation 的上級(jí)目錄執(zhí)行下面這一操作(即不能修改這條命令??),否則安裝失敗。??
bash installation/local-env-setup.sh
等待片刻時(shí)間, 查看所有的Pod 是否都正常啟動(dòng)運(yùn)行,如下所示。再進(jìn)行下一步相關(guān)操作。
[root@Crane ~]# kubectl get pods -n crane-system
NAME READY STATUS RESTARTS AGE
craned-75d5fcff49-2ppnn 2/2 Running 0 121m
fadvisor-6c6867dcb9-tscxm 1/1 Running 0 121m
grafana-8fb6974cc-kzgzf 1/1 Running 0 121m
metric-adapter-789b5b8bc5-hnt9g 1/1 Running 0 121m
prometheus-kube-state-metrics-69c44479cb-jlzmh 1/1 Running 0 121m
prometheus-prometheus-node-exporter-4xmrg 1/1 Running 0 121m
prometheus-server-6cb8bc86c4-wxdsz 2/2 Running 0 121m
4.2. 訪問Crane Dashboard??
重新打開一個(gè)新的終端,執(zhí)行如下命令:
# ??每打開一個(gè)終端進(jìn)行操作時(shí),都需要執(zhí)行配置環(huán)境變量這一條命令(不然會(huì)出現(xiàn)8080端口被拒絕的提示),如下圖所示。
export KUBECONFIG=/root/.kube/config_crane
# 執(zhí)行此命令,訪問Crane Dashboard。如下圖所示。
kubectl -n crane-system port-forward service/craned 9090:9090
??重點(diǎn)??:本實(shí)驗(yàn)使用虛擬機(jī)進(jìn)行安裝部署,直接執(zhí)行127.0.0.1:9090
或者192.168.200.60:9090
,均無法訪問到DashBoard。此時(shí),系統(tǒng)需要將本地的端口做下反向代理,將9090端口轉(zhuǎn)發(fā)給80,在瀏覽器中直接輸入主機(jī)IP地址192.168.200.60
即可訪問。具體執(zhí)行操作如下所示。
# 安裝nginx服務(wù)
yum install -y nginx
# 修改nginx.conf配置文件,修改內(nèi)容如下:
server_name 192.168.200.60;
location / {
proxy_pass http://127.0.0.1:9090;
proxy_http_version 1.1;
proxy_set_header Upgrade $http_upgrade;
proxy_set_header Connection keep-alive;
proxy_set_header Host $host;
proxy_cache_bypass $http_upgrade;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
proxy_set_header X-Forwarded-Proto $scheme;
}
# 按:wq報(bào)錯(cuò)退出。
# 檢查nginx配置文件是否正確,執(zhí)行結(jié)果如下所示。
[root@Crane ~]# nginx -t
nginx: the configuration file /etc/nginx/nginx.conf syntax is ok
nginx: configuration file /etc/nginx/nginx.conf test is successful
# 啟動(dòng)nginx服務(wù)并設(shè)置開機(jī)自啟動(dòng)
systemctl start && systemctl enable nginx && systemctl status nginx
如果還訪問不了Dashboard,請(qǐng)自行檢查pods是否都已經(jīng)正常啟動(dòng);系統(tǒng)環(huán)境設(shè)置是否做了限制(如防火墻是否關(guān)閉或開啟對(duì)應(yīng)端口)。
4.3. 添加集群???
添加集群。
添加完成。
五、集群功能演示???
5.1. 使用智能彈性EffectiveHPA??
Crane提供了一種名為EffectiveHorizontalPodAutoscaler(EHPA)的彈性伸縮產(chǎn)品,它基于社區(qū)HPA技術(shù)實(shí)現(xiàn)了彈性控制功能,并支持更多的彈性觸發(fā)策略,包括預(yù)測(cè)、觀測(cè)和周期等,從而實(shí)現(xiàn)更高效的彈性控制,并確保了服務(wù)質(zhì)量。簡(jiǎn)而言之,EHPA是一種高效的彈性伸縮方案,可以為服務(wù)提供更好的保障。
5.1.1 安裝Metrics Server
# 執(zhí)行命令安裝Metrics Server。需要在installation上一級(jí)目錄下執(zhí)行此命令。??
[root@Crane training]# kubectl apply -f installation/components.yaml
serviceaccount/metrics-server created
clusterrole.rbac.authorization.k8s.io/system:aggregated-metrics-reader created
clusterrole.rbac.authorization.k8s.io/system:metrics-server created
rolebinding.rbac.authorization.k8s.io/metrics-server-auth-reader created
clusterrolebinding.rbac.authorization.k8s.io/metrics-server:system:auth-delegator created
clusterrolebinding.rbac.authorization.k8s.io/system:metrics-server created
service/metrics-server created
deployment.apps/metrics-server created
apiservice.apiregistration.k8s.io/v1beta1.metrics.k8s.io created
查看該pod是否正常啟動(dòng)。
[root@Crane ~]# kubectl get pod -n kube-system | grep metrics-server
metrics-server-79c88ff4f-mz96g 1/1 Running 0 16m
5.1.2 創(chuàng)建測(cè)試應(yīng)用
使用以下命令啟動(dòng)一個(gè) Deployment 用 hpa-example 鏡像運(yùn)行一個(gè)容器, 然后將其暴露為一個(gè) 服務(wù)(Service)
[root@Crane training]# kubectl apply -f installation/php-apache.yaml
deployment.apps/php-apache created
service/php-apache created
[root@Crane training]# kubectl apply -f installation/nginx-deployment.yaml
deployment.apps/nginx-deployment created
[root@Crane training]# kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-deployment-758fd5cc9f-27bpz 1/1 Running 0 108s
nginx-deployment-758fd5cc9f-mxxx9 1/1 Running 0 108s
nginx-deployment-758fd5cc9f-np46c 1/1 Running 0 108s
nginx-deployment-758fd5cc9f-p8s9q 1/1 Running 0 108s
nginx-deployment-758fd5cc9f-tc2mj 1/1 Running 0 108s
php-apache-7d59cc57d4-8tnph 1/1 Running 0 2m5s
5.1.3 創(chuàng)建 EffectiveHPA
[root@Crane training]# kubectl apply -f installation/effective-hpa.yaml
effectivehorizontalpodautoscaler.autoscaling.crane.io/php-apache created
# 查看 EffectiveHPA 的狀態(tài)信息
[root@Crane training]# kubectl get ehpa
NAME STRATEGY MINPODS MAXPODS SPECIFICPODS REPLICAS AGE
php-apache Auto 1 10 0 10s
5.1.4 增加負(fù)載測(cè)試
# 打開新的終端窗口,配置環(huán)境變量
export KUBECONFIG=${HOME}/.kube/config_crane
kubectl run -i --tty load-generator --rm --image=busybox:1.28 --restart=Never -- /bin/sh -c "while sleep 0.01; do wget -q -O- http://php-apache; done"
可以按CTRL+C終止上述請(qǐng)求。隨著請(qǐng)求增多,CPU利用率會(huì)不斷提升,可以看到 EffectiveHPA 會(huì)自動(dòng)擴(kuò)容實(shí)例。如下所示。
增加負(fù)載后,相關(guān)信息參數(shù)如下所示。
[root@crane training]# kubectl get hpa ehpa-php-apache --watch
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
ehpa-php-apache Deployment/php-apache 29%/50% 1 10 1 107s
ehpa-php-apache Deployment/php-apache 250%/50% 1 10 1 2m1s
ehpa-php-apache Deployment/php-apache 240%/50% 1 10 4 2m16s
ehpa-php-apache Deployment/php-apache 109%/50% 1 10 5 2m31s
ehpa-php-apache Deployment/php-apache 70%/50% 1 10 5 2m46s
[root@Crane training]# kubectl get ehpa
NAME STRATEGY MINPODS MAXPODS SPECIFICPODS REPLICAS AGE
php-apache Auto 1 10 6 7m1s
[root@Crane training]# kubectl get pods
NAME READY STATUS RESTARTS AGE
nginx-deployment-758fd5cc9f-27bpz 1/1 Running 0 5m22s
nginx-deployment-758fd5cc9f-mxxx9 1/1 Running 0 5m22s
nginx-deployment-758fd5cc9f-np46c 1/1 Running 0 5m22s
nginx-deployment-758fd5cc9f-p8s9q 1/1 Running 0 5m22s
nginx-deployment-758fd5cc9f-tc2mj 1/1 Running 0 5m22s
php-apache-7d59cc57d4-4zf6j 1/1 Running 0 57s
php-apache-7d59cc57d4-6b5h8 1/1 Running 0 72s
php-apache-7d59cc57d4-8tnph 1/1 Running 0 5m39s
php-apache-7d59cc57d4-kjkff 1/1 Running 0 27s
php-apache-7d59cc57d4-lwnpp 1/1 Running 0 72s
php-apache-7d59cc57d4-zhdfk 1/1 Running 0 57s
通過下面截圖對(duì)比可知,創(chuàng)建的應(yīng)用在增加負(fù)載壓力測(cè)試過程中,發(fā)生了自動(dòng)擴(kuò)縮容。
5.2. 成本展示 ??
集群總覽
通過登錄Grafana數(shù)據(jù)可視化展示平臺(tái)進(jìn)行查驗(yàn)。本實(shí)驗(yàn)的輸入的地址是:http://192.168.200.60/grafana
。
默認(rèn)賬號(hào)/密碼:admin/admin
export KUBECONFIG=${HOME}/.kube/config_crane
kubectl -n crane-system port-forward service/grafana 8082:8082
5.3. 應(yīng)用資源優(yōu)化??
在 dashboard 中看到相關(guān)的成本數(shù)據(jù),是因?yàn)樵谔砑蛹旱臅r(shí)候安裝了推薦的規(guī)則。
推薦框架會(huì)自動(dòng)分析集群的各種資源的運(yùn)行情況并給出優(yōu)化建議。
Crane 的推薦模塊會(huì)定期檢測(cè)發(fā)現(xiàn)集群資源配置的問題,并給出優(yōu)化建議。
智能推薦提供了多種 Recommender 來實(shí)現(xiàn)面向不同資源的優(yōu)化推薦。
在
成本分析>推薦規(guī)則
頁面可以看到安裝的兩個(gè)推薦規(guī)則。
這些推薦規(guī)則實(shí)際上在將 K8s 集群接入Dashboard時(shí)安裝上的 RecommendationRule CRD 對(duì)象:
[root@Crane ~]# kubectl get RecommendationRule
NAME RUNINTERVAL AGE
idlenodes-rule 24h 5h8m
workloads-rule 24h 5h8m
RecommendationRule
是一個(gè)集群維度的對(duì)象,該推薦規(guī)則會(huì)對(duì)所有命名空間中的 Deployments 和 StatefulSets 做資源推薦和副本數(shù)推薦。
需要注意的是資源類型和 recommenders 需要可以匹配,比如 Resource 推薦默認(rèn)只支持 Deployments 和 StatefulSets。
?? 查看閑置節(jié)點(diǎn)推薦規(guī)則的資源對(duì)象??
[root@Crane ~]# kubectl get recommendationrule idlenodes-rule -oyaml
apiVersion: analysis.crane.io/v1alpha1
kind: RecommendationRule
metadata:
creationTimestamp: "2023-05-10T04:27:24Z"
generation: 2
labels:
analysis.crane.io/recommendation-rule-preinstall: "true"
name: idlenodes-rule
resourceVersion: "3494"
uid: 034152a2-e4ae-4d3b-8223-624f2315e067
spec:
namespaceSelector:
any: true
recommenders:
- name: IdleNode
resourceSelectors:
- apiVersion: v1
kind: Node
runInterval: 24h
status:
lastUpdateTime: "2023-05-10T04:27:24Z"
recommendations:
- lastStartTime: "2023-05-10T04:27:24Z"
message: 'Failed to run recommendation flow in recommender IdleNode: Node crane-control-plane
is not a idle node '
recommenderRef:
name: IdleNode
targetRef:
apiVersion: v1
kind: Node
name: crane-control-plane
runNumber: 1
?? 查看集群生成的多個(gè)優(yōu)化建議 Recommendation 對(duì)象??
[root@Crane ~]# kubectl get recommendations -A
NAMESPACE NAME TYPE TARGETKIND TARGETNAMESPACE TARGETNAME STRATEGY PERIODSECONDS ADOPTIONTYPE AGE
crane-system workloads-rule-resource-254v6 Resource Deployment crane-system metric-adapter Once StatusAndAnnotation 5h12m
crane-system workloads-rule-resource-7c4jg Resource Deployment crane-system prometheus-kube-state-metrics Once StatusAndAnnotation 5h12m
crane-system workloads-rule-resource-hwr7p Resource Deployment crane-system prometheus-server Once StatusAndAnnotation 5h12m
crane-system workloads-rule-resource-m5ws6 Resource Deployment crane-system grafana Once StatusAndAnnotation 5h12m
??查看任意優(yōu)化建議對(duì)象??
kubectl get recommend workloads-rule-resource-254v6 -n crane-system -oyaml
apiVersion: analysis.crane.io/v1alpha1
kind: Recommendation
metadata:
annotations:
analysis.crane.io/run-number: "1"
creationTimestamp: "2023-05-10T04:27:24Z"
generateName: workloads-rule-resource-
generation: 2
labels:
analysis.crane.io/recommendation-rule-name: workloads-rule
analysis.crane.io/recommendation-rule-recommender: Resource
analysis.crane.io/recommendation-rule-uid: ae95350e-5bfb-4fa7-955c-a69907d17b70
analysis.crane.io/recommendation-target-kind: Deployment
analysis.crane.io/recommendation-target-name: metric-adapter
analysis.crane.io/recommendation-target-version: v1
app: metric-adapter
app.kubernetes.io/instance: crane
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: crane
app.kubernetes.io/version: v0.10.0
helm.sh/chart: crane-0.10.0
name: workloads-rule-resource-254v6
namespace: crane-system
ownerReferences:
- apiVersion: analysis.crane.io/v1alpha1
blockOwnerDeletion: false
controller: false
kind: RecommendationRule
name: workloads-rule
uid: ae95350e-5bfb-4fa7-955c-a69907d17b70
resourceVersion: "3515"
uid: b00b3e5d-400d-455d-a338-7526c5d7d6c1
spec:
adoptionType: StatusAndAnnotation
completionStrategy:
completionStrategyType: Once
targetRef:
apiVersion: apps/v1
kind: Deployment
name: metric-adapter
namespace: crane-system
type: Resource
status:
action: Patch
conditions:
- lastTransitionTime: "2023-05-10T04:27:24Z"
message: Recommendation is ready
reason: RecommendationReady
status: "True"
type: Ready
currentInfo: '{"spec":{"template":{"spec":{"containers":[{"name":"metric-adapter","resources":{"requests":{"cpu":"0","memory":"0"}}}]}}}}'
lastUpdateTime: "2023-05-10T04:27:24Z"
recommendedInfo: '{"spec":{"template":{"spec":{"containers":[{"name":"metric-adapter","resources":{"requests":{"cpu":"114m","memory":"120586239"}}}]}}}}'
recommendedValue: |
resourceRequest:
containers:
- containerName: metric-adapter
target:
cpu: 114m
memory: "120586239"
targetRef: {}
[root@Crane ~]#
通過Web控制面板也可以查看上述信息。
對(duì)于閑置節(jié)點(diǎn)推薦,由于節(jié)點(diǎn)的下線在不同平臺(tái)上的步驟不同,用戶可以根據(jù)自身需求進(jìn)行節(jié)點(diǎn)的下線或者縮容。
應(yīng)用在監(jiān)控系統(tǒng)(比如 Prometheus)中的歷史數(shù)據(jù)越久,推薦結(jié)果就越準(zhǔn)確,建議生產(chǎn)上超過兩周時(shí)間。
對(duì)新建應(yīng)用的預(yù)測(cè)往往不準(zhǔn)。
六、清理實(shí)驗(yàn)環(huán)境數(shù)據(jù)??
[root@Crane ~]# kind delete cluster --name=crane
Deleting cluster "crane" ...
Deleted nodes: ["crane-control-plane"]
七、常見問題及注意事項(xiàng)??
??報(bào)錯(cuò)一:執(zhí)行安裝部署命令時(shí)報(bào)錯(cuò),發(fā)生中斷,原因是執(zhí)行命令位置不對(duì)。應(yīng)該在installation上一級(jí)目錄中執(zhí)行。
[root@crane installation]# bash local-env-setup.sh
Step1: Create local cluster: /root/.kube/config_crane
Deleting cluster "crane" ...
Deleted nodes: ["crane-control-plane"]
Creating cluster "crane" ...
? Ensuring node image (kindest/node:v1.21.1) ??
? Preparing nodes ??
? Writing configuration ??
? Starting control-plane ???
? Installing CNI ??
? Installing StorageClass ??
Set kubectl context to "kind-crane"
You can now use your cluster with:
kubectl cluster-info --context kind-crane --kubeconfig /root/.kube/config_crane
Have a question, bug, or feature request? Let us know! https://kind.sigs.k8s.io/#community ??
Step1: Create local cluster finished.
Step2: Installing Prometheus
Error: INSTALLATION FAILED: repo installation not found
??報(bào)錯(cuò)二:針對(duì)此報(bào)錯(cuò),需要先執(zhí)行export KUBECONFIG=${HOME}/.kube/config_crane
命令,賦予環(huán)境變量再執(zhí)行。
??報(bào)錯(cuò)三:查看pod詳細(xì)信息,報(bào)錯(cuò),需要指定namespace命名空間
。
[root@crane ~]# kubectl describe pods/fadvisor-6c6867dcb9-5zfxg
Error from server (NotFound): pods "fadvisor-6c6867dcb9-5zfxg" not found
[root@crane ~]# kubectl describe pods/fadvisor-6c6867dcb9-5zfxg --namespace crane-system
?報(bào)錯(cuò)四:pod狀態(tài)發(fā)生異常,一直停留在ImagePullBackOff
狀態(tài),因?yàn)樾枰L問外網(wǎng),或者重啟docker服務(wù)再次嘗試。也可以查看該容器詳細(xì)信息。
[root@crane ~]# kubectl get pods -n crane-system
NAME READY STATUS RESTARTS AGE
craned-75d5fcff49-d4xmj 2/2 Running 9 21h
fadvisor-6c6867dcb9-5zfxg 0/1 ImagePullBackOff 1 21h
grafana-84b5cdc55b-svb2f 1/1 Running 2 21h
kube-state-metrics-b5fdc98b5-jbsv8 1/1 Running 3 21h
metric-adapter-789b5b8bc5-9hznv 1/1 Running 3 21h
prometheus-server-67cb89fc9b-6g9l9 2/2 Running 4 21h
[root@crane ~]# kubectl get pods -n crane-system
NAME READY STATUS RESTARTS AGE
craned-75d5fcff49-d4xmj 2/2 Running 9 21h
fadvisor-6c6867dcb9-5zfxg 0/1 ErrImagePull 1 21h
grafana-84b5cdc55b-svb2f 0/1 Unknown 2 21h
kube-state-metrics-b5fdc98b5-jbsv8 0/1 Running 4 21h
metric-adapter-789b5b8bc5-9hznv 1/1 Running 3 21h
prometheus-server-67cb89fc9b-6g9l9 2/2 Running 4 21h
八、總結(jié) ??
? ? ? ? ? ? ? ? 當(dāng)前計(jì)算機(jī)工程方向已經(jīng)進(jìn)入云原生時(shí)代,容器和容器編排(K8s)成為計(jì)算機(jī)工程方向熱門技術(shù)。而Crane開源項(xiàng)目就是針對(duì)云原生成本進(jìn)行的多維度多角度的優(yōu)化分析,給客戶帶來了降本增效、可持續(xù)發(fā)展的價(jià)值。
? ? ? ? ? ? ? ? 在參加完兩次Finops Crane開源項(xiàng)的目直播活動(dòng),感覺收獲頗多。直播大咖帶領(lǐng)我們學(xué)習(xí)并了解了開源項(xiàng)目的理論知識(shí),為我們進(jìn)行了詳細(xì)的講解與技術(shù)框架結(jié)構(gòu)的剖析與關(guān)聯(lián),通過前期理論知識(shí)的學(xué)習(xí)與掌握,之后,指導(dǎo)我們親自動(dòng)手搭建場(chǎng)景化解決方案。通過實(shí)際動(dòng)手操作,對(duì)Crane整個(gè)項(xiàng)目又有了進(jìn)一步的認(rèn)識(shí)與理解,最大限度地鍛煉開發(fā)人員地技術(shù)能力。
? ? ? ? ? ? ? ? 從移動(dòng)互聯(lián)網(wǎng)時(shí)代步入云時(shí)代,上云用云似乎變得不可或缺,無論是企業(yè)、政府、還是最終用戶都在積極擁抱云服務(wù)。一方面在最大化的降低運(yùn)維開銷成本,實(shí)現(xiàn)數(shù)字化轉(zhuǎn)型升級(jí)的同時(shí),另一方面也希望對(duì)云中資源的投入與使用達(dá)到最佳合理化。與此同時(shí),云中資源的監(jiān)控與管理將變得尤為重要,Crane開源項(xiàng)目就此誕生出來。
針對(duì)Finops Crane開源項(xiàng)目的建議方向:(個(gè)人觀點(diǎn))
- 可以增加
AlterManager監(jiān)控告警功能模塊
,配合Prometheus+Grafana進(jìn)行組合使用。當(dāng)云中某項(xiàng)資源或指標(biāo)觸發(fā)到設(shè)定的閾值,立刻發(fā)送消息通知至客戶終端,如發(fā)生異常信息,能夠做到及時(shí)響應(yīng),及時(shí)處理。
九、參考鏈接??
??https://gocrane.io/
??騰訊云 Finops Crane 開發(fā)者集訓(xùn)營(yíng)
??https://github.com/gocrane/crane
關(guān)于騰訊云 Finops Crane 集訓(xùn)營(yíng):
Finops Crane集訓(xùn)營(yíng)主要面向廣大開發(fā)者,旨在提升開發(fā)者在容器部署、K8s層面的動(dòng)手實(shí)踐能力,同時(shí)吸納Crane開源項(xiàng)目貢獻(xiàn)者,鼓勵(lì)開發(fā)者提交issue、bug反饋等,并搭載線上直播、動(dòng)手實(shí)驗(yàn)組隊(duì)、有獎(jiǎng)?wù)魑牡认盗屑夹g(shù)活動(dòng)。既能讓開發(fā)者通過活動(dòng)對(duì) Finops Crane 開源項(xiàng)目有深入了解,同時(shí)也能幫助廣大開發(fā)者在云原生技能上有實(shí)質(zhì)性收獲。
為獎(jiǎng)勵(lì)開發(fā)者,我們特別設(shè)立了積分獲取任務(wù)和對(duì)應(yīng)的積分兌換禮品。
活動(dòng)介紹送門:https://marketing.csdn.net/p/038ae30af2357473fc5431b63e4e1a78
開源項(xiàng)目: https://github.com/gocrane/crane文章來源:http://www.zghlxwxcb.cn/news/detail-446500.html
文章來源地址http://www.zghlxwxcb.cn/news/detail-446500.html
到了這里,關(guān)于【騰訊云Finops Crane集訓(xùn)營(yíng)】利用云原生成本優(yōu)化項(xiàng)目實(shí)現(xiàn)降本增效泰褲辣~的文章就介紹完了。如果您還想了解更多內(nèi)容,請(qǐng)?jiān)谟疑辖撬阉鱐OY模板網(wǎng)以前的文章或繼續(xù)瀏覽下面的相關(guān)文章,希望大家以后多多支持TOY模板網(wǎng)!