When trying to deploy a pod on our AKS cluster, it hanged in the pending state. I looked at the logs and noticed the following warning:
FailedScheduling – 1 node(s) had volume node affinity conflict
The pod I tried to deploy had a persistent volume claim and I was certain that the persistent volume was succesfully deployed and available.
What was going wrong?
It turned out that my AKS cluster was deployed in 3 availability zones but I had only 2 nodes running:
AKS cluster is gedeployed in 3 zones maar er zijn maar 2 nodes:
$ kubectl describe nodes | grep -e "Name:" -e "failure-domain.beta.kubernetes.io/zone"
Name: aks-agentpool-38609413-vmss000003
failure-domain.beta.kubernetes.io/zone=westeurope-1
Name: aks-agentpool-38609413-vmss000004
failure-domain.beta.kubernetes.io/zone=westeurope-2
Here we can see that our nodes are living in zones westeurope-1 and westeurope-2.
If we now take a look at our persistent volume, we can see that it is deployed in zone westeurope-3:
$ kubectl describe pv pvc-ecc
failure-domain.beta.kubernetes.io/zone=westeurope-3
Annotations: pv.kubernetes.io/bound-by-controller: yes
pv.kubernetes.io/provisioned-by: kubernetes.io/azure-disk
volumehelper.VolumeDynamicallyCreatedByKey: azure-disk-dynamic-provisioner
Finalizers: [kubernetes.io/pv-protection]
StorageClass: default
Status: Bound
Claim: appservice-ns/appservice-ext-k8se-build-service
Reclaim Policy: Delete
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 100Gi
Node Affinity:
Required Terms:
Term 0: failure-domain.beta.kubernetes.io/region in [westeurope]
failure-domain.beta.kubernetes.io/zone in [westeurope-3]
Message:
That explains why the deployment failed as the pod cannot connect to the volume on another zone.
As a quick solution I introduced a 3th node in the cluster so that at least one node can connect to the persistent volume.