Kubernetes Pod Graceful Shutdown — How?

FoxuTech
5 min readAug 18, 2022

--

Recently we have seen how the pod creation lifecycle, if you read about this, you could guess correctly how to pod deletion works. Yes, you are correct it is reverse process of pod creation. Here it starts from the removing the endpoints first. Let’s take a look how to pod deletion works, and also let’s see how Kubernetes Pod to Gracefully Shutdown.

Terminating a pod

In any application lifecycle, there are various reason pods get terminated, as like that in Kubernetes also it happens, either by user providing Kubectl delete or any updates etc. Other hand it may get terminated due to resource issue. In this case, Kubernetes allows the containers running in the pod to get shutdown gracefully with some configuration. Before we see about the configuration, lets understand how the delete/termination operation follows.

Once the user provided the kubectl delete command, it will be passed to API server, from there endpoints will be remove from the endpoints object, as we seen while pod creation the endpoint is important to get update for serving any services.

In this operation readiness probe are ignored and it will be directly removing the endpoint from the control plane. This will trigger the events to kube-proxy, ingress controller, DNS, etc.

So, with this all those components updates their reference and stop serving traffic to the IP address, please be note, this may be quick operation but sometimes the component may busy by performing some other operations. Hence there will be some delay expected, so the reference won’t be get updated immediately.

In the same time, status of the pod in etcd changes the status to Terminating.

You can watch also in youtube:

Kubelet get notified from the polling and it assigns the operation to components as like pod creation. Here

  • Unmounting any volumes from the container to the Container Storage Interface (CSI).
  • Detaching the container from the network and releasing the IP address to the Container Network Interface (CNI).
  • Destroying the container to the Container Runtime Interface (CRI).

Follow image may explains the change getting performed.

Hope the image is clear and you could see the key difference between pod creation and deletion. While the pod creation we have seen Kubernetes waited for the update from Kubelet to report the IP details and then updated the endpoints. But when the pod gets terminate, it removes the endpoint and also update to Kubelet sametime.

How this could be an issue? well here is the catch, as we said sometime the components takes time to update the endpoints, in this case what if the pod gets deleted before the endpoints get propagated, yes, we will face downtime. But why?

As mentioned still the ingress or any high-level services are not got updated, still it forwards the traffic to the pod which is already removed. But we might think, it is Kubernetes responsibility to update the changes to across the cluster and should avoid such an issue.

But it is definitely not.

As Kubernetes uses endpoint object and advanced abstractions like Endpoint Slices, to distribute the endpoints, it doesn’t verify the changes up-to-date on the components.

Hmm, how we can avoid these scenarios, as this may cause the downtime and we cannot maintain the 100% application uptime. Only option to achieve this, pod should wait to get deleted before the endpoint updated. We guessed just by seeing the situation, but is that possible? Let’s check it.

terminationGracePeriodSeconds

For that we should understand some deep understand about what happens in containers when the delete given.

When the delete has been given to pod, it receives the SIGTERM signal. By default, Kubernetes will send the SIGTERM signal and waits for 30 seconds before force killing the process. So we can enable some option to wait for sometime and then perform the action, like.

  • Wait for sometimes before exiting.
  • Still process the traffic for some time, like 10–20secs.
  • Then close all the backend connections like database, WebSocket
  • Finally close the process.

Incase if you application expects more time (more then 30sec) to stop, then you can include or change terminationGracePeriodSeconds in your pod definition.

You can include a script to wait for some time and then exit. In this case, before the SIGTERM invoked, Kubernetes exposes a prestop hook in the pod. You can mention like below,

apiVersion: v1
kind: Pod
metadata:
name: test-pod
spec:
containers:
- name: nginx
image: nginx
ports:
- name: nginx
containerPort: 80
lifecycle:
preStop:
exec:
command: ["sleep", "10"]

With this option, you could see the Kubelet wait for 30s and then progress the SIGTERM, but noted this again may not sufficient, as you application may still processing some old requests. How you avoid those? You can achieve this by adding “terminationGracePeriodSeconds” with this setting it will wait further and then terminate the container. The final manifest will be looks like this,

apiVersion: v1
kind: Pod
metadata:
name: test-pod
spec:
containers:
- name: nginx
image: nginx
ports:
- name: nginx
containerPort: 80
lifecycle:
preStop:
exec:
command: ["sleep", "10"]
terminationGracePeriodSeconds: 45

This setting should help the application to process all the requests and close the connections. This will avoid the forceful shutdown.

Command line

You can also change the default grace period when you manually delete a resource with kubectl delete command, adding the parameter --grace-period=SECONDS. For example:

Continue Reading on: Kubernetes Pod Graceful Shutdown — How? — FoxuTech

You can follow us on social media, to get some regular updates

--

--

FoxuTech

Discuss about #Linux, #DevOps, #Docker, #kubernetes, #HowTo’s, #cloud & IT technologies like #argocd #crossplane #azure https://foxutech.com/