nodeSelector
最简单的调度方式nodeSelector
方式,仅需给Pod提供,就可以让Pod调度到对应的节点上。例如:
查看node标签
➜ ~ kubectl get node k8s01 --show-labels
NAME STATUS ROLES AGE VERSION LABELS
k8s01 Ready control-plane,master 564d v1.22.4 beta.kubernetes.io/arch=amd64,beta.kubernetes.io/os=linux,kubernetes.io/arch=amd64,kubernetes.io/hostname=k8s01,kubernetes.io/os=linux,node-role.kubernetes.io/control-plane=,node-role.kubernetes.io/master=,node.kubernetes.io/exclude-from-external-load-balancers=
Pod通过spec.nodeSelector选择标签
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
env: test
spec:
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
nodeSelector:
disktype: ssd
🩱缺点:万一所有节点都不符合条件,则会Pod会卡住无法调度。
亲和性和反亲和性策略
➜ ~ kubectl explain pod.spec.affinity
KIND: Pod
VERSION: v1
RESOURCE: affinity <Object>
DESCRIPTION:
If specified, the pod's scheduling constraints
Affinity is a group of affinity scheduling rules.
FIELDS:
nodeAffinity <Object>
Describes node affinity scheduling rules for the pod.
podAffinity <Object>
Describes pod affinity scheduling rules (e.g. co-locate this pod in the
same node, zone, etc. as some other pod(s)).
podAntiAffinity <Object>
Describes pod anti-affinity scheduling rules (e.g. avoid putting this pod
in the same node, zone, etc. as some other pod(s)).
节点只有亲和性 nodeAffinity:其意思是如果有符合条件的node,就将pod调度到这个node上
Pod分为亲和性 podAffinity、反亲和性 podAntiAffinity,以及拓扑网格topologyKey:
- 对于 podAffinity 和 podAntiAffinity,k8s将其限定在一个称为``topologyKey
的概念。通过
topologyKey`,可以将节点划分为若干拓扑网格。 - eg:例如 topologyKey: “kubernetes.io/hostname”,则表示按主机名划分拓扑网格。此时:
- podAffinity :表示在划分的每一个拓扑网格内,若已经有符合条件的pod,则将pod调度到
topologyKey
内 - podAntiAffinity:表示在划分的每一个拓扑网格内,若已经有符合条件的pod,则不可调度到
topologyKey
内
- podAffinity :表示在划分的每一个拓扑网格内,若已经有符合条件的pod,则将pod调度到
调度策略类型
支持nodeAffinity、podAffinity、podAntiAffinity
➜ ~ kubectl explain pod.spec.affinity.podAffinity
KIND: Pod
VERSION: v1
RESOURCE: podAffinity <Object>
DESCRIPTION:
Describes pod affinity scheduling rules (e.g. co-locate this pod in the
same node, zone, etc. as some other pod(s)).
Pod affinity is a group of inter pod affinity scheduling rules.
FIELDS:
preferredDuringSchedulingIgnoredDuringExecution <[]Object>
The scheduler will prefer to schedule pods to nodes that satisfy the
affinity expressions specified by this field, but it may choose a node that
violates one or more of the expressions. The node that is most preferred is
the one with the greatest sum of weights, i.e. for each node that meets all
of the scheduling requirements (resource request, requiredDuringScheduling
affinity expressions, etc.), compute a sum by iterating through the
elements of this field and adding "weight" to the sum if the node has pods
which matches the corresponding podAffinityTerm; the node(s) with the
highest sum are the most preferred.
requiredDuringSchedulingIgnoredDuringExecution <[]Object>
If the affinity requirements specified by this field are not met at
scheduling time, the pod will not be scheduled onto the node. If the
affinity requirements specified by this field cease to be met at some point
during pod execution (e.g. due to a pod label update), the system may or
may not try to eventually evict the pod from its node. When there are
multiple elements, the lists of nodes corresponding to each podAffinityTerm
are intersected, i.e. all terms must be satisfied.
-
硬限制:requiredDuringSchedulingIgnoredDuringExecution 调度必须满足条件+忽略执行期间条件变化
表示【必须】满足设定的条件才可以调度,如果没有满足条件的,就不停重试。其中IgnoreDuringExecution表示pod部署后运行期间,如果不再满足设定的条件,pod也会继续运行。
-
软限制:preferredDuringSchedulingIgnoredDuringExecution 调度尽可能满足条件+忽略执行期间条件变化
表示【尽可能】满足设定的条件才可以调度,如果没有满足条件的,就忽略这些条件,按照正常逻辑部署。其中IgnoreDuringExecution表示pod部署之后运行的时候,如果不再满足设定的条件,pod也会继续运行。
- 软限制有权重概念,也就是说软限制可以同时设置多个条件,并根据权重来优先考虑条件。
💖策略可以组合使用
未来可能支持的调度策略
-
requiredDuringSchedulingRequiredDuringExecution 调度必须满足条件+不可忽略执行期间条件变化
表示【必须】满足设定的条件才可以调度,如果没有满足条件的,就不停重试。其中RequiredDuringExecution表示pod部署后运行期间,如果不再满足设定的条件,则被驱逐重新调度。 -
preferredDuringSchedulingRequiredDuringExecution 调度尽可能满足条件+不可忽略执行期间条件变化
表示【尽可能】满足设定的条件才可以调度,如果没有满足条件的,就忽略这些条件,按照正常逻辑部署。其中RequiredDuringExecution表示pod部署之后运行的时候,如果不再满足设定的条件,则被驱逐重新调度。
案例1
实现目标:每一个节点(topologyKey)上,均只有一个web-server和一个redis
redis的清单
apiVersion: apps/v1
kind: Deployment
metadata:
name: redis-cache
spec:
selector:
matchLabels:
app: store
replicas: 3
template:
metadata:
labels:
app: store
spec:
affinity:
podAntiAffinity: # 反亲和性:以节点名划分拓扑域,若拓扑域内已有app=store的Pod,则不可再将本Pod调度进此拓扑域
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- store
topologyKey: "kubernetes.io/hostname"
containers:
- name: redis-server
image: redis:3.2-alpine
web-server的清单
apiVersion: apps/v1
kind: Deployment
metadata:
name: web-server
spec:
selector:
matchLabels:
app: web-store
replicas: 3
template:
metadata:
labels:
app: web-store
spec:
affinity:
podAntiAffinity: # 反亲和性:以节点名划分拓扑域,若拓扑域内已有app=web-store的Pod,则不可再将本Pod调度进此拓扑域
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- web-store
topologyKey: "kubernetes.io/hostname"
podAffinity: # 亲和性:以节点名划分拓扑域,若拓扑域内已有app=store的pod,则可将本Pod调度进此拓扑域
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchExpressions:
- key: app
operator: In
values:
- store
topologyKey: "kubernetes.io/hostname"
containers:
- name: web-app
image: nginx:1.12-alpine
💛如果要将超出节点数的Pod尽可能的均衡负载,则Pod反亲和应该使用preferredDuringSchedulingIgnoredDuringExecution
,这可以确保在每一个节点都部署Pod后,依然可以将Pod部署进去。
spec:
affinity:
podAntiAffinity:
preferredDuringSchedulingIgnoredDuringExecution:
- weight: 1
podAffinityTerm:
labelSelector:
matchExpressions:
- key: app
operator: In
values:
- web-store
topologyKey: "kubernetes.io/hostname"
案例2
线上服务器组专用节点
# 添加污点,非prod服务不可调度到此节点
kubectl taint nodes k8s001 dedicated=prod:NoSchedule
# 添加标签
kubectl label nodes k8s001 dedicated=prod
apiVersion: v1
kind: Pod
metadata:
name: nginx
labels:
dedicated: prod
spec:
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: dedicated
operator: In
values:
- prod
tolerations:
- key: "dedicated"
operator: "Equal"
value: "prod"
effect: "NoSchedule"
containers:
- name: nginx
image: nginx
imagePullPolicy: IfNotPresent
affinity.nodeAffinity 节点亲和确保了必须调度到拥有 dedicated=prod 标签的节点,而 k8s001 拥有此标签。
- 🩱注意:当节点的dedicated != prod的时候,Pod将【不会】重新调度到其它满足条件的节点上。
tolerations 确保了 nginx pod 可以调度到拥有 dedicated=prod:NoSchedule 污点的节点,而 k8s001 拥有此污点。