k8s☞16kubeadm集群升级

阅读量: zyh 2020-10-19 10:50:58
Categories: > Tags:

前言

https://kubernetes.io/docs/tasks/administer-cluster/kubeadm/kubeadm-upgrade/

⚠️ 务必先看此页面,因为不能跨多个版本升级。需要根据当前版本一步一步升级。

分为两部分,升级核心节点和升级工作节点.

升级控制平面节点 Control Plane

控制平面核心组件包含:

列出想升级的kube版本

这里以1.20-1.21版本为例

所有核心节点上执行

# 选择主版本号
master_verions=1.21
yum list --showduplicates kubeadm --disableexcludes=kubernetes | grep ${master_verions}
===
kubeadm.x86_64                       1.20.8-0                        @kubernetes
kubeadm.x86_64                       1.20.0-0                        kubernetes
kubeadm.x86_64                       1.20.1-0                        kubernetes
kubeadm.x86_64                       1.20.2-0                        kubernetes
kubeadm.x86_64                       1.20.4-0                        kubernetes
kubeadm.x86_64                       1.20.5-0                        kubernetes
kubeadm.x86_64                       1.20.6-0                        kubernetes
kubeadm.x86_64                       1.20.7-0                        kubernetes
kubeadm.x86_64                       1.20.8-0                        kubernetes

–disableexcludes=kubernetes 只允许kubernetes库

所有核心节点:提前下载好升级所需的镜像

所有核心节点执行

# 选择主版本号对应的最新稳定版本
full_version=1.21.7

列出版本所需的包

kubeadm config images list --kubernetes-version=${full_version}
===
k8s.gcr.io/kube-apiserver:v1.21.7
k8s.gcr.io/kube-controller-manager:v1.21.7
k8s.gcr.io/kube-scheduler:v1.21.7
k8s.gcr.io/kube-proxy:v1.21.7
k8s.gcr.io/pause:3.5
k8s.gcr.io/etcd:3.4.13-0
k8s.gcr.io/coredns/coredns:v1.8.4

⚠️本文档撰写的时候,上述命令无法列出正确的包,例如v1.22.4版本,版本发布中提到etcd已更新至3.5.0,但却无法找到,此时会输出如下信息,即列出上一个最新版本.

could not find officially supported version of etcd for Kubernetes v1.22.4, falling back to the nearest etcd version (3.4.13-0)

此时,你应该已版本发布信息里的版本为准.

ℹ️ 可以通过提前升级一个kubeadm,并执行kubeadm upgrade plan得到正确的包版本.

构建包拉取脚本

所有核心节点执行

根据上面的输出版本,修改下面脚本中 pause etcd coredns 的版本号

cat>images-pull.sh<<EOF
#!/bin/bash
# kubeadm config images list
images=(
kube-apiserver:v${full_version}
kube-controller-manager:v${full_version}
kube-scheduler:v${full_version}
kube-proxy:v${full_version}
pause:3.5
etcd:3.4.13-0
coredns:1.8.4
)
for imageName in \${images[@]};
do
    docker pull registry.aliyuncs.com/google_containers/\${imageName}
    docker tag registry.aliyuncs.com/google_containers/\${imageName} k8s.gcr.io/\${imageName}
    docker rmi registry.aliyuncs.com/google_containers/\${imageName}
done
EOF
bash images-pull.sh

如果某些包无法下载,则需自行去 dockerhub 上找,并在下载完毕后,添加 k8s 自己的 tags,例如 coredns
docker pull coredns/coredns:1.6.7
docker tag coredns/coredns:1.6.7 k8s.gcr.io/coredns:1.6.7
docker rmi coredns/coredns:1.6.7

第一个核心节点:安装目标版本的kubeadm

在第一个要升级的核心节点上执行

升级计划是以kubeadm的版本为基准的,例如你当前安装的 kubeadm 的版本是 x,那么之后通过 kubeadm 命令列出的升级计划就是升级到 x 的最新稳定版。

因此,我们需要将 kubeadm 升级到目标版本

cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
yum install -y kubeadm-${full_version} --disableexcludes=kubernetes

第一个核心节点:列出升级计划

在第一个要升级的核心节点上执行

kubeadm upgrade plan

会输出三部分内容,

第一部分是当前集群信息 v1.20.8和当前kubeadm版本对应的最新稳定版v1.21.7,以及当前集群版本最新稳定版v1.20.13,还有最新的v1.22.4版本

[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade] Fetching available versions to upgrade to
[upgrade/versions] Cluster version: v1.20.8
[upgrade/versions] kubeadm version: v1.21.7
I1202 16:13:13.030774    4057 version.go:254] remote version is much newer: v1.22.4; falling back to: stable-1.21
[upgrade/versions] Target version: v1.21.7
[upgrade/versions] Latest version in the v1.20 series: v1.20.13

第二部分是升级信息,升级信息又分为两部分。

Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT       TARGET
kubelet     3 x v1.20.8   v1.20.13

Upgrade to the latest version in the v1.20 series:

COMPONENT                 CURRENT    TARGET
kube-apiserver            v1.20.8    v1.20.13
kube-controller-manager   v1.20.8    v1.20.13
kube-scheduler            v1.20.8    v1.20.13
kube-proxy                v1.20.8    v1.20.13
CoreDNS                   1.7.0      v1.8.0
etcd                      3.4.13-0   3.4.13-0

You can now apply the upgrade by executing the following command:

        kubeadm upgrade apply v1.20.13
Components that must be upgraded manually after you have upgraded the control plane with 'kubeadm upgrade apply':
COMPONENT   CURRENT       TARGET
kubelet     3 x v1.20.8   v1.21.7

Upgrade to the latest stable version:

COMPONENT                 CURRENT    TARGET
kube-apiserver            v1.20.8    v1.21.7
kube-controller-manager   v1.20.8    v1.21.7
kube-scheduler            v1.20.8    v1.21.7
kube-proxy                v1.20.8    v1.21.7
CoreDNS                   1.7.0      v1.8.0
etcd                      3.4.13-0   3.4.13-0

You can now apply the upgrade by executing the following command:

        kubeadm upgrade apply v1.21.7

第三部分是手动更新部分

The table below shows the current state of component configs as understood by this version of kubeadm.
Configs that have a "yes" mark in the "MANUAL UPGRADE REQUIRED" column require manual config upgrade or
resetting to kubeadm defaults before a successful upgrade can be performed. The version to manually
upgrade to is denoted in the "PREFERRED VERSION" column.

API GROUP                 CURRENT VERSION   PREFERRED VERSION   MANUAL UPGRADE REQUIRED
kubeproxy.config.k8s.io   v1alpha1          v1alpha1            no
kubelet.config.k8s.io     v1beta1           v1beta1             no

⭐️如果MANUAL UPGRADE REQUIRED标记是yes,则你需要手动进行升级。手动升级的前提是需要自行提供配置文件

kubeadm upgrade apply --config <配置文件>

ℹ️ kubeadm upgrade 总是会刷新证书。

第一个核心节点:升级主要组件

在第一个要升级的核心节点上执行

kubeadm upgrade apply v1.21.7
=== 期间会有一次确认
[upgrade/config] Making sure the configuration is correct:
[upgrade/config] Reading configuration from the cluster...
[upgrade/config] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[preflight] Running pre-flight checks.
[upgrade] Running cluster health checks
[upgrade/version] You have chosen to change the cluster version to "v1.21.7"
[upgrade/versions] Cluster version: v1.20.8
[upgrade/versions] kubeadm version: v1.21.7
[upgrade/confirm] Are you sure you want to proceed with the upgrade? [y/N]:y

=== 拉取镜像
[upgrade/prepull] Pulling images required for setting up a Kubernetes cluster
[upgrade/prepull] This might take a minute or two, depending on the speed of your internet connection
[upgrade/prepull] You can also perform this action in beforehand using 'kubeadm config images pull'

=== 静态pod升级,并备份的信息
[upgrade/apply] Upgrading your Static Pod-hosted control plane to version "v1.21.7"...
Static pod: kube-apiserver-k8s01 hash: 871c4bc8d662b466c6481ef11d3bc7ec
Static pod: kube-controller-manager-k8s01 hash: 22f0b6f7c55d4fef49a89a4f535241a0
Static pod: kube-scheduler-k8s01 hash: 98178ef8494b07ffc6d724adb4d8a0c3
[upgrade/etcd] Upgrading to TLS for etcd
Static pod: etcd-k8s01 hash: d9b8ab2ef694da7813c41fbf36833ba1
[upgrade/staticpods] Preparing for "etcd" upgrade
[upgrade/staticpods] Current and new manifests of etcd are equal, skipping upgrade
[upgrade/etcd] Waiting for etcd to become available
[upgrade/staticpods] Writing new Static Pod manifests to "/etc/kubernetes/tmp/kubeadm-upgraded-manifests437528341"
[upgrade/staticpods] Preparing for "kube-apiserver" upgrade
[upgrade/staticpods] Renewing apiserver certificate
[upgrade/staticpods] Renewing apiserver-kubelet-client certificate
[upgrade/staticpods] Renewing front-proxy-client certificate
[upgrade/staticpods] Renewing apiserver-etcd-client certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-apiserver.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2021-12-02-16-21-32/kube-apiserver.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-apiserver-k8s01 hash: 871c4bc8d662b466c6481ef11d3bc7ec
Static pod: kube-apiserver-k8s01 hash: 871c4bc8d662b466c6481ef11d3bc7ec
Static pod: kube-apiserver-k8s01 hash: 871c4bc8d662b466c6481ef11d3bc7ec
Static pod: kube-apiserver-k8s01 hash: 871c4bc8d662b466c6481ef11d3bc7ec
Static pod: kube-apiserver-k8s01 hash: 871c4bc8d662b466c6481ef11d3bc7ec
Static pod: kube-apiserver-k8s01 hash: 871c4bc8d662b466c6481ef11d3bc7ec
Static pod: kube-apiserver-k8s01 hash: 871c4bc8d662b466c6481ef11d3bc7ec
Static pod: kube-apiserver-k8s01 hash: 871c4bc8d662b466c6481ef11d3bc7ec
Static pod: kube-apiserver-k8s01 hash: 871c4bc8d662b466c6481ef11d3bc7ec
Static pod: kube-apiserver-k8s01 hash: 871c4bc8d662b466c6481ef11d3bc7ec
Static pod: kube-apiserver-k8s01 hash: c99ad4c36653e9251a653ed601ba1117
[apiclient] Found 3 Pods for label selector component=kube-apiserver
[upgrade/staticpods] Component "kube-apiserver" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-controller-manager" upgrade
[upgrade/staticpods] Renewing controller-manager.conf certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-controller-manager.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2021-12-02-16-21-32/kube-controller-manager.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-controller-manager-k8s01 hash: 22f0b6f7c55d4fef49a89a4f535241a0
Static pod: kube-controller-manager-k8s01 hash: 22f0b6f7c55d4fef49a89a4f535241a0
Static pod: kube-controller-manager-k8s01 hash: 22f0b6f7c55d4fef49a89a4f535241a0
Static pod: kube-controller-manager-k8s01 hash: 22f0b6f7c55d4fef49a89a4f535241a0
Static pod: kube-controller-manager-k8s01 hash: 22f0b6f7c55d4fef49a89a4f535241a0
Static pod: kube-controller-manager-k8s01 hash: 22f0b6f7c55d4fef49a89a4f535241a0
Static pod: kube-controller-manager-k8s01 hash: 22f0b6f7c55d4fef49a89a4f535241a0
Static pod: kube-controller-manager-k8s01 hash: ecd7e36a5c07f4ccf0f769e8f0fe6dc5
[apiclient] Found 3 Pods for label selector component=kube-controller-manager
[upgrade/staticpods] Component "kube-controller-manager" upgraded successfully!
[upgrade/staticpods] Preparing for "kube-scheduler" upgrade
[upgrade/staticpods] Renewing scheduler.conf certificate
[upgrade/staticpods] Moved new manifest to "/etc/kubernetes/manifests/kube-scheduler.yaml" and backed up old manifest to "/etc/kubernetes/tmp/kubeadm-backup-manifests-2021-12-02-16-21-32/kube-scheduler.yaml"
[upgrade/staticpods] Waiting for the kubelet to restart the component
[upgrade/staticpods] This might take a minute or longer depending on the component/version gap (timeout 5m0s)
Static pod: kube-scheduler-k8s01 hash: 98178ef8494b07ffc6d724adb4d8a0c3
Static pod: kube-scheduler-k8s01 hash: 98178ef8494b07ffc6d724adb4d8a0c3
Static pod: kube-scheduler-k8s01 hash: 98178ef8494b07ffc6d724adb4d8a0c3
Static pod: kube-scheduler-k8s01 hash: 98178ef8494b07ffc6d724adb4d8a0c3
Static pod: kube-scheduler-k8s01 hash: 98178ef8494b07ffc6d724adb4d8a0c3
Static pod: kube-scheduler-k8s01 hash: 98178ef8494b07ffc6d724adb4d8a0c3
Static pod: kube-scheduler-k8s01 hash: 98178ef8494b07ffc6d724adb4d8a0c3
Static pod: kube-scheduler-k8s01 hash: 7d2a77b067995d323e127a47f45e8f14
[apiclient] Found 3 Pods for label selector component=kube-scheduler
[upgrade/staticpods] Component "kube-scheduler" upgraded successfully!
[upgrade/postupgrade] Applying label node-role.kubernetes.io/control-plane='' to Nodes with label node-role.kubernetes.io/master='' (deprecated)
[upgrade/postupgrade] Applying label node.kubernetes.io/exclude-from-external-load-balancers='' to control plane Nodes
[upload-config] Storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.21" in namespace kube-system with the configuration for the kubelets in the cluster
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to get nodes
[bootstrap-token] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstrap-token] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstrap-token] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[addons] Applied essential addon: CoreDNS
[endpoint] WARNING: port specified in controlPlaneEndpoint overrides bindPort in the controlplane address
[addons] Applied essential addon: kube-proxy

[upgrade/successful] SUCCESS! Your cluster was upgraded to "v1.21.7". Enjoy!

=== 提醒若没有升级 kubelets 则需要继续升级
[upgrade/kubelet] Now that your control plane is upgraded, please proceed with upgrading your kubelets if you haven't already done so.

根据提示信息" Moved new manifest to “/etc/kubernetes/manifests/kube-apiserver.yaml” and backed up old manifest to “/etc/kubernetes/tmp/kubeadm-backup-manifests-2021-12-02-16-21-32/kube-apiserver.yaml” 可以得知备份文件路径.

第一个核心节点:升级网络插件

在拥有管理员权限的客户端操作

这里我用的是 flannel. 所以我只是简单的重新执行一边.因为 flannel 是 DaemonSet,所以无需在其它核心节点再次执行.

https://github.com/flannel-io/flannel#deploying-flannel-manually

wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

修改里面的data.net-conf.json下的 “Network”: “10.244.0.0/16”, 变更为你自己的 pod 网段,即 cm对象 kubeadm-config 里的 networking.podSubnet 或者 kubeadm 初始化安装参数 pod-network-cidr

kubectl describe cm kubeadm-config -n kube-system
===
ClusterConfiguration:
----
apiServer:
  extraArgs:
    authorization-mode: Node,RBAC
  timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controlPlaneEndpoint: k8sapi:8443
controllerManager: {}
dns:
  type: CoreDNS
etcd:
  local:
    dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.20.8
networking:
  dnsDomain: cluster.local
  podSubnet: 10.97.0.0/16
  serviceSubnet: 10.96.0.0/16
scheduler: {}
kubectl apply -f  kube-flannel.yml

等待pod部署完毕

➜   kubectl get pod -n kube-system -o wide | grep kube-flannel
kube-flannel-ds-52krd             1/1     Running   0          6m25s   10.200.16.102   k8s02   <none>           <none>
kube-flannel-ds-9hnrs             1/1     Running   0          4m31s   10.200.16.103   k8s03   <none>           <none>
kube-flannel-ds-b5b6m             1/1     Running   0          2m47s   10.200.16.101   k8s01   <none>           <none>

剩余核心节点:升级 kubeadm和主要组件

剩余的核心节点

cat <<EOF > /etc/yum.repos.d/kubernetes.repo
[kubernetes]
name=Kubernetes
baseurl=https://mirrors.aliyun.com/kubernetes/yum/repos/kubernetes-el7-x86_64/
enabled=1
gpgcheck=1
repo_gpgcheck=1
gpgkey=https://mirrors.aliyun.com/kubernetes/yum/doc/yum-key.gpg https://mirrors.aliyun.com/kubernetes/yum/doc/rpm-package-key.gpg
EOF
# 安装 kubeadm
yum install -y kubeadm-${full_version} --disableexcludes=kubernetes
# 升级节点
kubeadm upgrade node

所有核心节点:依次升级 kubelet 和 kubectl

依次升级每一个核心节点,不要同时升级.

停止调度

在拥有管理员权限的客户端操作

停止调度之前,如果有重型分布式节点,则需确保节点状态均为running.例如EFK或者etcd等

kubectl get pod -n kube-system
nodeHostname=
kubectl drain ${nodeHostname} --ignore-daemonsets

–ignore-daemonsets 忽略 daemonsets,因为 daemonsets 会在驱逐节点上重建 pod 从而导致驱逐失败。

ℹ️有时候驱逐会报错,提示如下信息:

error: cannot delete Pods with local storage (use --delete-emptydir-data to override): kube-system/metrics-server-6bd8d94d7f-c72hn

根据提示信息,判断提到的容器的本地数据是否有用,无用则直接附加--delete-emptydir-data即可。

kubectl get node
===
NAME    STATUS                     ROLES    AGE   VERSION
k8s01   Ready,SchedulingDisabled   master   40h   v1.18.6

升级 kubelet、kubectl

待升级核心节点执行

# 在待升级的核心节点上操作
yum install -y kubelet-${full_version} kubectl-${full_version} --disableexcludes=kubernetes
systemctl daemon-reload
systemctl restart kubelet

恢复调度

在拥有管理员权限的客户端操作

kubectl uncordon ${nodeHostname}
kubectl get node
NAME    STATUS   ROLES                  AGE    VERSION
k8s01   Ready    control-plane,master   408d   v1.20.8
k8s02   Ready    control-plane,master   408d   v1.20.8
k8s03   Ready    control-plane,master   282d   v1.20.8

所有核心节点:开启 kube-scheduler 和 kube-controller-manager 端口

若发现kubectl get cs 两个服务无法连接,则可以看下配置端口是否为0,如果是则执行下列命令

所有核心节点执行

sed -i '/- --port=0/d' /etc/kubernetes/manifests/kube-scheduler.yaml
sed -i '/- --port=0/d' /etc/kubernetes/manifests/kube-controller-manager.yaml

🌟不一定需要执行,具体以实际环境为准

检查升级结果

➜   kubectl get node
NAME    STATUS   ROLES                  AGE    VERSION
k8s01   Ready    control-plane,master   408d   v1.21.7
k8s02   Ready    control-plane,master   408d   v1.21.7
k8s03   Ready    control-plane,master   282d   v1.21.7
➜   kubectl get cs
Warning: v1 ComponentStatus is deprecated in v1.19+
NAME                 STATUS    MESSAGE             ERROR
controller-manager   Healthy   ok
scheduler            Healthy   ok
etcd-0               Healthy   {"health":"true"}
➜   kubectl get pod -n kube-system
NAME                              READY   STATUS    RESTARTS   AGE
coredns-85d9df8444-kd2w6          1/1     Running   0          28m
coredns-85d9df8444-m6r2f          1/1     Running   0          24m
etcd-k8s01                        1/1     Running   0          2m13s
etcd-k8s02                        1/1     Running   1          2m14s
etcd-k8s03                        1/1     Running   0          2m12s
kube-apiserver-k8s01              1/1     Running   0          2m13s
kube-apiserver-k8s02              1/1     Running   1          2m13s
kube-apiserver-k8s03              1/1     Running   0          2m15s
kube-controller-manager-k8s01     1/1     Running   0          105s
kube-controller-manager-k8s02     1/1     Running   0          106s
kube-controller-manager-k8s03     1/1     Running   0          105s
kube-flannel-ds-52krd             1/1     Running   0          44m
kube-flannel-ds-9hnrs             1/1     Running   0          42m
kube-flannel-ds-b5b6m             1/1     Running   0          40m
kube-proxy-lmkw9                  1/1     Running   0          53m
kube-proxy-nh98w                  1/1     Running   0          54m
kube-proxy-wdvfd                  1/1     Running   0          53m
kube-scheduler-k8s01              1/1     Running   0          107s
kube-scheduler-k8s02              1/1     Running   0          106s
kube-scheduler-k8s03              1/1     Running   0          101s
metrics-server-6bd8d94d7f-wp59m   1/1     Running   0          28m

升级工作节点 node

ℹ️应该一个一个的升级,而不是批量,避免集群压力过大.

我这里测试环境,核心节点和工作节点是重叠的,所以无需再升级工作节点。

full_version=1.21.7
nodeName=
# 升级kubeadm
yum install -y kubeadm-${appVersion} --disableexcludes=kubernetes
# 升级本地 kubelet 配置
kubeadm upgrade node
# 锁定node
kubectl drain ${nodeName} --ignore-daemonsets
# 升级 kubelet 和 kubectl
yum install -y kubelet-${appVersion} kubectl-${appVersion} --disableexcludes=kubernetes
# 重启
systemctl daemon-reload
systemctl restart kubelet
# 解锁node
kubectl uncordon ${nodeName}

升级失败

升级失败,没有回滚

可能是执行期间被意外关闭,则可以再次运行kubeadm upgrade

升级失败,自动回滚也失败

full_version=
kubeadm upgrade apply --force ${appVersion}

无论如何都失败,手动恢复

升级前,kubernetes 会备份 etcd 和 静态pod的内容

对应关系如下:

/etc/kubernetes/tmp/kubeadm-backup-etcd-<date>-<time>/etcd= /var/lib/etcd/

/etc/kubernetes/tmp/kubeadm-backup-manifests-<date>-<time>= /etc/kubernetes/manifests/

将等号左边的路径内容覆盖到等号右边,以便于静态pod重建