Позвольте мне предварить это, сказав, что это работает в производственном кластере, поэтому любое «деструктивное» решение, которое приведет к простоям, не будет вариантом (если это абсолютно необходимо).
Моя среда
У меня есть кластер Kubernetes (11 узлов, 3 из которых являются главными узлами), работающий под управлением v1.13.1 на AWS. Этот кластер был создан с помощью kOps вот так:
kops create cluster \
--yes \
--authorization RBAC \
--cloud aws \
--networking calico \
...
Не думаю, что это актуально, но все в кластере было установлено через helm3.
Вот мои точные версии:
$ helm version
version.BuildInfo{Version:"v3.4.1", GitCommit:"c4e74854886b2efe3321e185578e6db9be0a6e29", GitTreeState:"dirty", GoVersion:"go1.15.5"}
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.1", GitCommit:"c4d752765b3bbac2237bf87cf0b1c2e307844666", GitTreeState:"clean", BuildDate:"2020-12-19T08:38:20Z", GoVersion:"go1.15.5", Compiler:"gc", Platform:"darwin/amd64"}
Server Version: version.Info{Major:"1", Minor:"13", GitVersion:"v1.13.1", GitCommit:"eec55b9ba98609a46fee712359c7b5b365bdd920", GitTreeState:"clean", BuildDate:"2018-12-13T10:31:33Z", GoVersion:"go1.11.2", Compiler:"gc", Platform:"linux/amd64"}
$ kops version
Version 1.18.2
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-2-147-44.ec2.internal Ready node 47h v1.13.1
ip-10-2-149-115.ec2.internal Ready node 47h v1.13.1
ip-10-2-150-124.ec2.internal Ready master 2d v1.13.1
ip-10-2-151-33.ec2.internal Ready node 47h v1.13.1
ip-10-2-167-145.ec2.internal Ready master 43h v1.18.14
ip-10-2-167-162.ec2.internal Ready node 2d v1.13.1
ip-10-2-172-248.ec2.internal Ready node 47h v1.13.1
ip-10-2-173-134.ec2.internal Ready node 47h v1.13.1
ip-10-2-177-100.ec2.internal Ready master 2d v1.13.1
ip-10-2-181-235.ec2.internal Ready node 47h v1.13.1
ip-10-2-182-14.ec2.internal Ready node 47h v1.13.1
Что я пытаюсь сделать
Пытаюсь обновить кластер с v1.13.1
- ›v1.18.14
Я отредактировал конфиг
$ kops edit cluster
и изменил
kubernetesVersion: 1.18.14
потом я побежал
kops update cluster --yes
kops rolling-update cluster --yes
Которая затем запускает процесс непрерывного обновления.
NAME STATUS NEEDUPDATE READY MIN TARGET MAX NODES
master-us-east-1a NeedsUpdate 1 0 1 1 1 1
master-us-east-1b NeedsUpdate 1 0 1 1 1 1
master-us-east-1c NeedsUpdate 1 0 1 1 1 1
nodes NeedsUpdate 8 0 8 8 8 8
Проблема:
Процесс зависает при первом обновлении узла с этой ошибкой
I0108 10:48:40.137256 59317 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": master "ip-10-2-167-145.ec2.internal" is not ready, system-node-critical pod "calico-node-m255f" is not ready (calico-node).
I0108 10:49:12.474458 59317 instancegroups.go:440] Cluster did not pass validation, will retry in "30s": system-node-critical pod "calico-node-m255f" is not ready (calico-node).
calico-node-m255f
- единственный ситцевый узел в кластере (я уверен, что должен быть по одному на каждый узел k8s?)
Информация об этом модуле:
$ kubectl get pods -n kube-system -o wide | grep calico-node
calico-node-m255f 0/1 Running 0 35m 10.2.167.145 ip-10-2-167-145.ec2.internal <none> <none>
$ kubectl describe pod calico-node-m255f -n kube-system
Name: calico-node-m255f
Namespace: kube-system
Priority: 2000001000
Priority Class Name: system-node-critical
Node: ip-10-2-167-145.ec2.internal/10.2.167.145
Start Time: Fri, 08 Jan 2021 10:18:05 -0800
Labels: controller-revision-hash=59875785d9
k8s-app=calico-node
pod-template-generation=5
role.kubernetes.io/networking=1
Annotations: <none>
Status: Running
IP: 10.2.167.145
IPs: <none>
Controlled By: DaemonSet/calico-node
Init Containers:
upgrade-ipam:
Container ID: docker://9a6d035ee4a9d881574f45075e033597a33118e1ed2c964204cc2a5b175fbc60
Image: calico/cni:v3.15.3
Image ID: docker-pullable://calico/cni@sha256:519e5c74c3c801ee337ca49b95b47153e01fd02b7d2797c601aeda48dc6367ff
Port: <none>
Host Port: <none>
Command:
/opt/cni/bin/calico-ipam
-upgrade
State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 08 Jan 2021 10:18:06 -0800
Finished: Fri, 08 Jan 2021 10:18:06 -0800
Ready: True
Restart Count: 0
Environment:
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
Mounts:
/host/opt/cni/bin from cni-bin-dir (rw)
/var/lib/cni/networks from host-local-net-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-mnnrd (ro)
install-cni:
Container ID: docker://5788e3519a2b1c1b77824dbfa090ad387e27d5bb16b751c3cf7637a7154ac576
Image: calico/cni:v3.15.3
Image ID: docker-pullable://calico/cni@sha256:519e5c74c3c801ee337ca49b95b47153e01fd02b7d2797c601aeda48dc6367ff
Port: <none>
Host Port: <none>
Command:
/install-cni.sh
State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 08 Jan 2021 10:18:07 -0800
Finished: Fri, 08 Jan 2021 10:18:08 -0800
Ready: True
Restart Count: 0
Environment:
CNI_CONF_NAME: 10-calico.conflist
CNI_NETWORK_CONFIG: <set to the key 'cni_network_config' of config map 'calico-config'> Optional: false
KUBERNETES_NODE_NAME: (v1:spec.nodeName)
CNI_MTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
SLEEP: false
Mounts:
/host/etc/cni/net.d from cni-net-dir (rw)
/host/opt/cni/bin from cni-bin-dir (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-mnnrd (ro)
flexvol-driver:
Container ID: docker://bc8ad32a2dd0eb5bbb21843d4d248171bc117d2eede9e1efa9512026d9205888
Image: calico/pod2daemon-flexvol:v3.15.3
Image ID: docker-pullable://calico/pod2daemon-flexvol@sha256:cec7a31b08ab5f9b1ed14053b91fd08be83f58ddba0577e9dabd8b150a51233f
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Started: Fri, 08 Jan 2021 10:18:08 -0800
Finished: Fri, 08 Jan 2021 10:18:08 -0800
Ready: True
Restart Count: 0
Environment: <none>
Mounts:
/host/driver from flexvol-driver-host (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-mnnrd (ro)
Containers:
calico-node:
Container ID: docker://8911e4bdc0e60aa5f6c553c0e0d0e5f7aa981d62884141120d8f7cc5bc079884
Image: calico/node:v3.15.3
Image ID: docker-pullable://calico/node@sha256:1d674438fd05bd63162d9c7b732d51ed201ee7f6331458074e3639f4437e34b1
Port: <none>
Host Port: <none>
State: Running
Started: Fri, 08 Jan 2021 10:18:09 -0800
Ready: False
Restart Count: 0
Requests:
cpu: 100m
Liveness: exec [/bin/calico-node -felix-live -bird-live] delay=10s timeout=1s period=10s #success=1 #failure=6
Readiness: exec [/bin/calico-node -felix-ready -bird-ready] delay=0s timeout=1s period=10s #success=1 #failure=3
Environment:
DATASTORE_TYPE: kubernetes
WAIT_FOR_DATASTORE: true
NODENAME: (v1:spec.nodeName)
CALICO_NETWORKING_BACKEND: <set to the key 'calico_backend' of config map 'calico-config'> Optional: false
CLUSTER_TYPE: kops,bgp
IP: autodetect
CALICO_IPV4POOL_IPIP: Always
CALICO_IPV4POOL_VXLAN: Never
FELIX_IPINIPMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
FELIX_VXLANMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
FELIX_WIREGUARDMTU: <set to the key 'veth_mtu' of config map 'calico-config'> Optional: false
CALICO_IPV4POOL_CIDR: 100.96.0.0/11
CALICO_DISABLE_FILE_LOGGING: true
FELIX_DEFAULTENDPOINTTOHOSTACTION: ACCEPT
FELIX_IPV6SUPPORT: false
FELIX_LOGSEVERITYSCREEN: info
FELIX_HEALTHENABLED: true
FELIX_IPTABLESBACKEND: Auto
FELIX_PROMETHEUSMETRICSENABLED: false
FELIX_PROMETHEUSMETRICSPORT: 9091
FELIX_PROMETHEUSGOMETRICSENABLED: true
FELIX_PROMETHEUSPROCESSMETRICSENABLED: true
FELIX_WIREGUARDENABLED: false
Mounts:
/lib/modules from lib-modules (ro)
/run/xtables.lock from xtables-lock (rw)
/var/lib/calico from var-lib-calico (rw)
/var/run/calico from var-run-calico (rw)
/var/run/nodeagent from policysync (rw)
/var/run/secrets/kubernetes.io/serviceaccount from calico-node-token-mnnrd (ro)
Conditions:
Type Status
Initialized True
Ready False
ContainersReady False
PodScheduled True
Volumes:
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
var-run-calico:
Type: HostPath (bare host directory volume)
Path: /var/run/calico
HostPathType:
var-lib-calico:
Type: HostPath (bare host directory volume)
Path: /var/lib/calico
HostPathType:
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
cni-bin-dir:
Type: HostPath (bare host directory volume)
Path: /opt/cni/bin
HostPathType:
cni-net-dir:
Type: HostPath (bare host directory volume)
Path: /etc/cni/net.d
HostPathType:
host-local-net-dir:
Type: HostPath (bare host directory volume)
Path: /var/lib/cni/networks
HostPathType:
policysync:
Type: HostPath (bare host directory volume)
Path: /var/run/nodeagent
HostPathType: DirectoryOrCreate
flexvol-driver-host:
Type: HostPath (bare host directory volume)
Path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
HostPathType: DirectoryOrCreate
calico-node-token-mnnrd:
Type: Secret (a volume populated by a Secret)
SecretName: calico-node-token-mnnrd
Optional: false
QoS Class: Burstable
Node-Selectors: kubernetes.io/os=linux
Tolerations: :NoSchedule op=Exists
:NoExecute op=Exists
CriticalAddonsOnly op=Exists
node.kubernetes.io/disk-pressure:NoSchedule op=Exists
node.kubernetes.io/memory-pressure:NoSchedule op=Exists
node.kubernetes.io/network-unavailable:NoSchedule op=Exists
node.kubernetes.io/not-ready:NoExecute op=Exists
node.kubernetes.io/unreachable:NoExecute op=Exists
node.kubernetes.io/unschedulable:NoSchedule op=Exists
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 35m default-scheduler Successfully assigned kube-system/calico-node-m255f to ip-10-2-167-145.ec2.internal
Normal Pulled 35m kubelet Container image "calico/cni:v3.15.3" already present on machine
Normal Created 35m kubelet Created container upgrade-ipam
Normal Started 35m kubelet Started container upgrade-ipam
Normal Started 35m kubelet Started container install-cni
Normal Pulled 35m kubelet Container image "calico/cni:v3.15.3" already present on machine
Normal Created 35m kubelet Created container install-cni
Normal Pulled 35m kubelet Container image "calico/pod2daemon-flexvol:v3.15.3" already present on machine
Normal Created 35m kubelet Created container flexvol-driver
Normal Started 35m kubelet Started container flexvol-driver
Normal Started 35m kubelet Started container calico-node
Normal Pulled 35m kubelet Container image "calico/node:v3.15.3" already present on machine
Normal Created 35m kubelet Created container calico-node
Warning Unhealthy 35m kubelet Readiness probe failed: 2021-01-08 18:18:12.731 [INFO][130] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.2.147.44,10.2.149.115,10.2.150.124,10.2.151.33,10.2.167.162,10.2.172.248,10.2.173.134,10.2.177.100,10.2.181.235,10.2.182.14
Warning Unhealthy 35m kubelet Readiness probe failed: 2021-01-08 18:18:22.727 [INFO][169] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.2.147.44,10.2.149.115,10.2.150.124,10.2.151.33,10.2.167.162,10.2.172.248,10.2.173.134,10.2.177.100,10.2.181.235,10.2.182.14
Warning Unhealthy 35m kubelet Readiness probe failed: 2021-01-08 18:18:32.733 [INFO][207] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.2.147.44,10.2.149.115,10.2.150.124,10.2.151.33,10.2.167.162,10.2.172.248,10.2.173.134,10.2.177.100,10.2.181.235,10.2.182.14
Warning Unhealthy 35m kubelet Readiness probe failed: 2021-01-08 18:18:42.730 [INFO][237] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.2.147.44,10.2.149.115,10.2.150.124,10.2.151.33,10.2.167.162,10.2.172.248,10.2.173.134,10.2.177.100,10.2.181.235,10.2.182.14
Warning Unhealthy 35m kubelet Readiness probe failed: 2021-01-08 18:18:52.736 [INFO][268] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.2.147.44,10.2.149.115,10.2.150.124,10.2.151.33,10.2.167.162,10.2.172.248,10.2.173.134,10.2.177.100,10.2.181.235,10.2.182.14
Warning Unhealthy 34m kubelet Readiness probe failed: 2021-01-08 18:19:02.731 [INFO][294] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.2.147.44,10.2.149.115,10.2.150.124,10.2.151.33,10.2.167.162,10.2.172.248,10.2.173.134,10.2.177.100,10.2.181.235,10.2.182.14
Warning Unhealthy 34m kubelet Readiness probe failed: 2021-01-08 18:19:12.734 [INFO][318] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.2.147.44,10.2.149.115,10.2.150.124,10.2.151.33,10.2.167.162,10.2.172.248,10.2.173.134,10.2.177.100,10.2.181.235,10.2.182.14
Warning Unhealthy 34m kubelet Readiness probe failed: 2021-01-08 18:19:22.739 [INFO][360] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.2.147.44,10.2.149.115,10.2.150.124,10.2.151.33,10.2.167.162,10.2.172.248,10.2.173.134,10.2.177.100,10.2.181.235,10.2.182.14
Warning Unhealthy 34m kubelet Readiness probe failed: 2021-01-08 18:19:32.748 [INFO][391] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.2.147.44,10.2.149.115,10.2.150.124,10.2.151.33,10.2.167.162,10.2.172.248,10.2.173.134,10.2.177.100,10.2.181.235,10.2.182.14
Warning Unhealthy 45s (x202 over 34m) kubelet (combined from similar events): Readiness probe failed: 2021-01-08 18:53:12.726 [INFO][6053] confd/health.go 180: Number of node(s) with BGP peering established = 0
calico/node is not ready: BIRD is not ready: BGP not established with 10.2.147.44,10.2.149.115,10.2.150.124,10.2.151.33,10.2.167.162,10.2.172.248,10.2.173.134,10.2.177.100,10.2.181.235,10.2.182.14
Я могу подключиться по ssh к узлу и проверить оттуда ситцу
$ sudo ./calicoctl-linux-amd64 node status
Calico process is running.
IPv4 BGP status
+--------------+-------------------+-------+----------+--------------------------------+
| PEER ADDRESS | PEER TYPE | STATE | SINCE | INFO |
+--------------+-------------------+-------+----------+--------------------------------+
| 10.2.147.44 | node-to-node mesh | start | 00:21:18 | Active Socket: Connection |
| | | | | refused |
| 10.2.149.115 | node-to-node mesh | start | 00:21:18 | Active Socket: Connection |
| | | | | refused |
| 10.2.150.124 | node-to-node mesh | start | 00:21:18 | Active Socket: Connection |
| | | | | refused |
| 10.2.151.33 | node-to-node mesh | start | 00:21:18 | Active Socket: Connection |
| | | | | refused |
| 10.2.167.162 | node-to-node mesh | start | 00:21:18 | Passive |
| 10.2.172.248 | node-to-node mesh | start | 00:21:18 | Passive |
| 10.2.173.134 | node-to-node mesh | start | 00:21:18 | Passive |
| 10.2.177.100 | node-to-node mesh | start | 00:21:18 | Passive |
| 10.2.181.235 | node-to-node mesh | start | 00:21:18 | Passive |
| 10.2.182.14 | node-to-node mesh | start | 00:21:18 | Passive |
+--------------+-------------------+-------+----------+--------------------------------+
IPv6 BGP status
No IPv6 peers found.
Вот конфигурация Calico-node DaemonSet (я предполагаю, что она была сгенерирована kops и осталась нетронутой)
kind: DaemonSet
apiVersion: apps/v1
metadata:
name: calico-node
namespace: kube-system
selfLink: /apis/apps/v1/namespaces/kube-system/daemonsets/calico-node
uid: 33dfb80a-c840-11e9-af87-02fc30bb40d6
resourceVersion: '142850829'
generation: 5
creationTimestamp: '2019-08-26T20:29:28Z'
labels:
k8s-app: calico-node
role.kubernetes.io/networking: '1'
annotations:
deprecated.daemonset.template.generation: '5'
kubectl.kubernetes.io/last-applied-configuration: '[cut out to save space]'
spec:
selector:
matchLabels:
k8s-app: calico-node
template:
metadata:
creationTimestamp: null
labels:
k8s-app: calico-node
role.kubernetes.io/networking: '1'
spec:
volumes:
- name: lib-modules
hostPath:
path: /lib/modules
type: ''
- name: var-run-calico
hostPath:
path: /var/run/calico
type: ''
- name: var-lib-calico
hostPath:
path: /var/lib/calico
type: ''
- name: xtables-lock
hostPath:
path: /run/xtables.lock
type: FileOrCreate
- name: cni-bin-dir
hostPath:
path: /opt/cni/bin
type: ''
- name: cni-net-dir
hostPath:
path: /etc/cni/net.d
type: ''
- name: host-local-net-dir
hostPath:
path: /var/lib/cni/networks
type: ''
- name: policysync
hostPath:
path: /var/run/nodeagent
type: DirectoryOrCreate
- name: flexvol-driver-host
hostPath:
path: /usr/libexec/kubernetes/kubelet-plugins/volume/exec/nodeagent~uds
type: DirectoryOrCreate
initContainers:
- name: upgrade-ipam
image: 'calico/cni:v3.15.3'
command:
- /opt/cni/bin/calico-ipam
- '-upgrade'
env:
- name: KUBERNETES_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: CALICO_NETWORKING_BACKEND
valueFrom:
configMapKeyRef:
name: calico-config
key: calico_backend
resources: {}
volumeMounts:
- name: host-local-net-dir
mountPath: /var/lib/cni/networks
- name: cni-bin-dir
mountPath: /host/opt/cni/bin
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
procMount: Default
- name: install-cni
image: 'calico/cni:v3.15.3'
command:
- /install-cni.sh
env:
- name: CNI_CONF_NAME
value: 10-calico.conflist
- name: CNI_NETWORK_CONFIG
valueFrom:
configMapKeyRef:
name: calico-config
key: cni_network_config
- name: KUBERNETES_NODE_NAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: CNI_MTU
valueFrom:
configMapKeyRef:
name: calico-config
key: veth_mtu
- name: SLEEP
value: 'false'
resources: {}
volumeMounts:
- name: cni-bin-dir
mountPath: /host/opt/cni/bin
- name: cni-net-dir
mountPath: /host/etc/cni/net.d
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
procMount: Default
- name: flexvol-driver
image: 'calico/pod2daemon-flexvol:v3.15.3'
resources: {}
volumeMounts:
- name: flexvol-driver-host
mountPath: /host/driver
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
procMount: Default
containers:
- name: calico-node
image: 'calico/node:v3.15.3'
env:
- name: DATASTORE_TYPE
value: kubernetes
- name: WAIT_FOR_DATASTORE
value: 'true'
- name: NODENAME
valueFrom:
fieldRef:
apiVersion: v1
fieldPath: spec.nodeName
- name: CALICO_NETWORKING_BACKEND
valueFrom:
configMapKeyRef:
name: calico-config
key: calico_backend
- name: CLUSTER_TYPE
value: 'kops,bgp'
- name: IP
value: autodetect
- name: CALICO_IPV4POOL_IPIP
value: Always
- name: CALICO_IPV4POOL_VXLAN
value: Never
- name: FELIX_IPINIPMTU
valueFrom:
configMapKeyRef:
name: calico-config
key: veth_mtu
- name: FELIX_VXLANMTU
valueFrom:
configMapKeyRef:
name: calico-config
key: veth_mtu
- name: FELIX_WIREGUARDMTU
valueFrom:
configMapKeyRef:
name: calico-config
key: veth_mtu
- name: CALICO_IPV4POOL_CIDR
value: 100.96.0.0/11
- name: CALICO_DISABLE_FILE_LOGGING
value: 'true'
- name: FELIX_DEFAULTENDPOINTTOHOSTACTION
value: ACCEPT
- name: FELIX_IPV6SUPPORT
value: 'false'
- name: FELIX_LOGSEVERITYSCREEN
value: info
- name: FELIX_HEALTHENABLED
value: 'true'
- name: FELIX_IPTABLESBACKEND
value: Auto
- name: FELIX_PROMETHEUSMETRICSENABLED
value: 'false'
- name: FELIX_PROMETHEUSMETRICSPORT
value: '9091'
- name: FELIX_PROMETHEUSGOMETRICSENABLED
value: 'true'
- name: FELIX_PROMETHEUSPROCESSMETRICSENABLED
value: 'true'
- name: FELIX_WIREGUARDENABLED
value: 'false'
resources:
requests:
cpu: 100m
volumeMounts:
- name: lib-modules
readOnly: true
mountPath: /lib/modules
- name: xtables-lock
mountPath: /run/xtables.lock
- name: var-run-calico
mountPath: /var/run/calico
- name: var-lib-calico
mountPath: /var/lib/calico
- name: policysync
mountPath: /var/run/nodeagent
livenessProbe:
exec:
command:
- /bin/calico-node
- '-felix-live'
- '-bird-live'
initialDelaySeconds: 10
timeoutSeconds: 1
periodSeconds: 10
successThreshold: 1
failureThreshold: 6
readinessProbe:
exec:
command:
- /bin/calico-node
- '-felix-ready'
- '-bird-ready'
timeoutSeconds: 1
periodSeconds: 10
successThreshold: 1
failureThreshold: 3
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
imagePullPolicy: IfNotPresent
securityContext:
privileged: true
procMount: Default
restartPolicy: Always
terminationGracePeriodSeconds: 0
dnsPolicy: ClusterFirst
nodeSelector:
kubernetes.io/os: linux
serviceAccountName: calico-node
serviceAccount: calico-node
hostNetwork: true
securityContext: {}
schedulerName: default-scheduler
tolerations:
- operator: Exists
effect: NoSchedule
- key: CriticalAddonsOnly
operator: Exists
- operator: Exists
effect: NoExecute
priorityClassName: system-node-critical
updateStrategy:
type: RollingUpdate
rollingUpdate:
maxUnavailable: 1
revisionHistoryLimit: 10
status:
currentNumberScheduled: 1
numberMisscheduled: 0
desiredNumberScheduled: 1
numberReady: 0
observedGeneration: 5
updatedNumberScheduled: 1
numberUnavailable: 1
В журналах модуля тоже нет ничего действительно полезного; никаких ошибок или чего-либо очевидного. В основном это выглядит так:
2021-01-08 19:08:21.603 [INFO][48] felix/int_dataplane.go 1245: Applying dataplane updates
2021-01-08 19:08:21.603 [INFO][48] felix/ipsets.go 223: Asked to resync with the dataplane on next update. family="inet"
2021-01-08 19:08:21.603 [INFO][48] felix/ipsets.go 306: Resyncing ipsets with dataplane. family="inet"
2021-01-08 19:08:21.603 [INFO][48] felix/wireguard.go 578: Wireguard is not enabled
2021-01-08 19:08:21.605 [INFO][48] felix/ipsets.go 356: Finished resync family="inet" numInconsistenciesFound=0 resyncDuration=1.573324ms
2021-01-08 19:08:21.605 [INFO][48] felix/int_dataplane.go 1259: Finished applying updates to dataplane. msecToApply=2.03915
Что я пробовал
К сожалению, я не специалист по сетям, поэтому не стал слишком углубляться в специфику ситца.
Я попытался перезагрузить связанные поды, перезагрузить фактический экземпляр EC2, удалить набор демонов и повторно добавить его, используя указанную выше конфигурацию.
Я также могу заверить вас, что во внутренней сети нет сетевых ограничений (брандмауэры, группы sec и т. Д.), Которые могли бы блокировать соединения.
Также стоит отметить, что этот кластер работал безупречно до kops rolling-update
попытки.
Я в значительной степени нахожусь здесь, и не знаю, что еще я мог бы попробовать.