As we are moving towards massive Kubernetes world, we are all experimenting lot and trying to enable/extend our service wildly, this mean we are growing better than before. With that we cannot predict how many computing resources are needed. As no one master yet on Kubernetes and it is always stands unpredicted when the growth calculated. But it is our responsible to maintain the approx. resource available all the time and it shouldn’t get wasted when there is under usage. How to fix, Kubernetes autoscaling helps to maintain the resource and make sure only necessary resources are used.
Introduction
Autoscaling is a technique used in computing to dynamically adjust computational resources, such as CPU and memory, more efficiently depending upon the incoming traffic of your application. This technique available even with virtual machine era. Now the autoscaling also one of the core features of container orchestrator tools like Kubernetes.
Let’s Imagine we have an application deployed and running on Kubernetes; We are not sure the scaling requirement or how much resource required. At last, we will be end up with paying a lot more for the resources even if didn’t used. There the autoscaling comes to help us utilize resources efficiently in two ways. Its helps to,
- Decreasing the number of pods or nodes when the load is low.
- Increasing it when there’s a spike in traffic.
Here a few specific ways autoscaling optimizes resource use:
- Saving on cost by using your infrastructure.
- Increasing the uptime of your workloads in cases where you have an unpredictable load.
- The ability to run less time-sensitive workloads once you have some free capacity because of autoscaling in low-traffic scenarios.
In this article, lets understand what is Kubernetes Autoscaling and what are the methods it is providing and how autoscaling working in Kubernetes.
What Is Autoscaling?
Autoscaling was first introduced in Kubernetes 1.3. When we talk about autoscaling in the Kubernetes context, in most cases, we ultimately scale pod replicas up and down automatically based on a given metric, like CPU or RAM.
We can achieve this by using Horizontal Pod Autoscaler (HPA). Autoscaling capabilities offered by HPA are applicable at the pod level, too. But you can autoscale your Kubernetes worker nodes using cluster/node autoscaler by adding new nodes dynamically. Managed Kubernetes offerings such as GKE by Google already offer such autoscaling capabilities, so you don’t have to reinvent the wheel and worry about its implementation.
In most cases with managed Kubernetes instances such as GKE, you specify a minimum and a maximum number of nodes, and the cluster autoscaler automatically adjusts the rest. Google has sort of won the Kubernetes battle among the cloud vendors by introducing Autopilot. GKE Autopilot is a hands-off approach to managed Kubernetes instances where Google manages every part (control plane, nodes, etc.) of your Kubernetes infrastructure.
Let’s discuss three different autoscaling methods offered by Kubernetes.
Horizontal Pod Autoscaler (HPA)
This method can also be referred to as scaling out. In this method, Kubernetes allows DevOps engineer, SRE, or your cluster admin to increase or decrease the number of pods automatically based upon your application resource usage. With HPA, you typically set a threshold for metrics such as CPU and memory and then scale up or down the number of pods running based upon their current use against the threshold that we set.
Vertical Pod Autoscaler (VPA)
This method can also be referred to as scaling up. Typically, with vertical scaling, we throw more resources such as CPU and memory to existing machines. In the Kubernetes context, Vertical Pod Autoscaler recommends or automatically adjusts values for CPU and memory. VPA frees you from worrying about what value to use for CPU and memory requests, and limits for your pods.
Continue reading on https://foxutech.com/kubernetes-autoscaling-a-complete-guide/