Horizontal Pod Autoscaler(hpa) — Know Everything About it

6 min readFeb 24, 2023

In our recent post, we have discussed about Kubernetes autoscaling and methods of autoscaling with benefits and best practices. In this article, we are going to learn about one of Autoscaling method called Horizontal Pod Autoscaler. This method has own advantage as like other methods, and this is widely adopted method also, as per our understanding. Let’s see in detail about this method.

What Is Horizontal Pod Autoscaler (HPA)?

A Kubernetes cluster is made up of one or more virtual machines called nodes. In Kubernetes, a pod is the smallest resource in the hierarchy and your application containers are deployed as pods. A pod is a logical construct in Kubernetes and requires a node to run, and a node can have one or more pods running inside of it.

Horizontal Pod Autoscaler is a type of autoscaler that can increase or decrease the number of pods in a Deployment, ReplicationController, StatefulSet, or ReplicaSet, usually in response to CPU utilization patterns. This process represents horizontal scaling because it changes the number of instances, not the resources allocated to a given container.

How Does HPA Work and What Are Its Benefits?

By default, HPA scales workloads based on pod metrics like average CPU/memory utilization and average pod utilization. It is also possible to use externally provided or custom metrics. After the initial setup, it can operate automatically — you only need to define the minimum and the maximum number of replicas, as per your requirements or demand.

The configured HPA controller is responsible for checking metrics and scaling replicas accordingly by adding or removing pods. This scaling occurs automatically, but you can sometimes account for predictable fluctuations in loading requirements. HPA works in a loop by checking, updating, and re-checking metrics.

In the first step of the HPA loop, the controller continuously tracks resource (like CPU, Memory and other custom) utilization via the metrics server. Next, HPA calculates the optimal number of replicas based on the resource requirements. Then, the autoscaler decides whether to scale the application up or down. In the last step of the loop, HPA implements the target number of replicas.

As HPA is a continuous monitoring process, so this loop repeats as soon as it finishes. The default interval for HPA checks is 30 seconds. Use the --horizontal-pod-autoscaler-sync-period controller manager flag to change the interval value.

The autoscaling/v1 API version of the HPA only supports the average CPU utilization metric. The autoscaling/v2 API version allows scaling according to memory usage, defining custom metrics, and using multiple metrics in a single HPA object.

What Is the Impact of HPA on Kubernetes Resource Costs?

Running multiple workloads on a server instance can be cost-effective but tracking your Kubernetes costs and identifying where you can save is challenging. Autoscaling lets you tightly configure scaling to reduce waste and minimize application running costs.

Application usage often changes over time, requiring more or fewer pod replicas. HPA scales your workloads automatically. It is useful for stateless and stateful applications. Combining HPA with cluster scaling helps reduce costs for workloads with frequent demand changes, decreasing the number of nodes alongside the pods.

Properly configured, the HPA controller can monitor pods to determine if the number of replicas needs changing. It compares the current value to the target value.

How to Use HPA Metrics

As discussed above, the Horizontal Pod Autoscaler (HPA) enables horizontal scaling of container workloads running in Kubernetes. In order for HPA to work, the Kubernetes cluster needs to have metrics enabled. See how to enable metrics in the Kubernetes metrics server tool.

Kubernetes HPA supports four kinds of metrics:

Resource Metric

Resource metrics refer to CPU and memory utilization of Kubernetes pods against the values provided in the limits and requests of the pod spec. These metrics are natively known to Kubernetes through the metrics server. The values are averaged together before comparing them with the target values. That is, if three replicas are running for your application, the utilization values will be averaged and compared against the CPU and memory requests defined in your deployment spec.

Object Metric

Object metrics describe the information available in a single Kubernetes resource. An example of this would be hits per second for an ingress object.

Pod Metric

Pod metrics (referred to as PodsMetricSource) references pod-based metric information at runtime and can be collected in Kubernetes. An example would be transactions processed per second in a pod. If there are multiple pods for a given PodsMetricSource, the values will be collected and averaged together before being compared against the target threshold values.

External Metrics

External metrics are metrics gathered from sources running outside the scope of a Kubernetes cluster. For example, metrics from Prometheus can be queried for the length of a queue in a cloud messaging service, or QPS from a load balancer running outside of the cluster.

Horizontal Pod Autoscaler API Versions

API version autoscaling/v1 is the stable and default version; this version of API only supports CPU utilization-based autoscaling.

autoscaling/v2 version of the API brings usage of multiple metrics, custom and external metrics support.

You can verify which API versions are supported on your cluster by querying the api-versions. This command lists, all the versions.

# kubectl api-versions | grep autoscaling
autoscaling/v1
autoscaling/v2
autoscaling/v2beta1
autoscaling/v2beta2

Requirements

Horizontal Pod Autoscaler (and also Vertical Pod Autoscaler) requires a Metrics Server installed in the Kubernetes cluster. Metric Server is a container resource metrics (such as memory and CPU usage) source that is scalable, can be configured for high availability, and is efficient on resource usage when operating. Metrics Server gather metrics -by default- every 15 seconds from Kubelets, this allows rapid autoscaling,

You can easily check if the metric server is installed or not by issuing the following command:

# kubectl top pods

The following message will be shown if the metrics server is not installed.

error: Metrics API not available

On the other hand, if the Metric Server is installed, you should get appropriate output with resource utilization.

Installation of Metrics Server

If you have already installed Metrics Server, you can skip this section.

Metrics Server offers two easy installation mechanisms; one is using kubectl that includes all the manifests.

# kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml

The second option is using the Helm chart, which is preferred. Helm values can be found here.

First, add the Metrics-Server Helm repository to your local repository list as follows.

# helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/

Now you can install the Metrics Server via Helm.

# helm upgrade --install metrics-server metrics-server/metrics-server

If you have a self-signed certificate, you should add --set args={--kubelet-insecure-tls} to the command above.

Verifying the Installation

As the installation is finished and we allow some time for the Metrics Server to get ready, let’s try the command again.

# kubectl top pods -n argocd
NAME                                                CPU(cores)   MEMORY(bytes)
argocd-application-controller-0                     4m           24Mi
argocd-applicationset-controller-596ddc6c7d-d7lgl   4m           29Mi
argocd-dex-server-78c894df5b-svc87                  14m          28Mi
argocd-notifications-controller-6f65c4ccdb-5cpb8    3m           22Mi
argocd-redis-ha-haproxy-787f9b5689-rpn62            6m           71Mi
argocd-redis-ha-server-0                            13m          20Mi
argocd-repo-server-75b7c59bfb-cqtbz                 15m          26Mi
argocd-server-d86d7959d-sd98v                       16m          31Mi

Also, we can see the resources of the nodes with a similar command.

# kubectl top nodes
NAME                                CPU(cores)   CPU%   MEMORY(bytes)   MEMORY%
aks-agentpool-12792500-vmss000002   125m         3%     1233Mi          9%

You can also send queries directly to the Metric Server via kubectl.

# kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes | jq

We can also verify our pod’s metrics from the API.

# kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/argocd/pods/argocd-server-d86d7959d-sd98v | jq

This is either the Metric Server control loop that hasn’t run yet, is not running correctly, or resource requests are not set on the target pod spec.

How to Configure Horizontal Pod Autoscaling?

As an illustration of the horizontal pod autoscaling capabilities, this article will show you how to:

Create a test deployment.
Create an HPA via the command line or use the declarative approach.
Apply custom metrics.
Apply multiple metrics.

Continue reading on https://foxutech.com/horizontal-pod-autoscaler-know-everything-about-it/

Horizontal Pod Autoscaler(hpa) — Know Everything About it

What Is Horizontal Pod Autoscaler (HPA)?

How Does HPA Work and What Are Its Benefits?

What Is the Impact of HPA on Kubernetes Resource Costs?

How to Use HPA Metrics

Resource Metric

Object Metric

Pod Metric

External Metrics

Horizontal Pod Autoscaler API Versions

Requirements

Installation of Metrics Server

Verifying the Installation

How to Configure Horizontal Pod Autoscaling?

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by FoxuTech

No responses yet

More from FoxuTech

Argo CD CLI Installation and Commands

So far, we have seen Argo CD with UI mostly and now let’s see another feature provided by Argo CD, which CLI to manage the Argo CD. CLI…

Kubernetes Pod Graceful Shutdown — How?

Recently we have seen how the pod creation lifecycle, if you read about this, you could guess correctly how to pod deletion works. Yes, you…

How to Troubleshoot Kubernetes Pod in Pending State

Part of our Kubernetes troubleshooting series, in this article will see how to troubleshoot, pod on pending state and reasons.

Bitnami Sealed Secrets — Kubernetes Secret Management

Nowadays we are using GitOps for application deployment and for that we tend to put all the application’s information and configuration on…

Recommended from Medium

Kubernetes Lab 14: Ingress-A Guide to Traffic Routing in Kubernetes

In this 6-minute lab, we’ll explore how to optimize traffic management in Kubernetes with Ingress. The lab will feature a step-by-step…

This new IDE from Google is an absolute game changer

This new IDE from Google is seriously revolutionary.

Spring Says Goodbye to @Autowired: Here’s What to Use Instead

Yes, starting with Spring Boot 3 and Spring Framework 6, Spring has been encouraging constructor-based dependency injection over field…

I Pretended to Be a Man on a Dating Site — And I Hate What I Discovered

As a 23-year-old woman fascinated by human behavior (and, let’s be honest, sometimes just bored and curious), I decided to conduct a…

Shadow Paging in Virtualization: Understanding the Basics

Virtualization is a cornerstone of modern computing, allowing multiple operating systems to run simultaneously on a single physical…

How I Optimized a Spring Boot Application to Handle 1M Requests/Second 🚀

Discover the exact techniques I used to scale a Spring Boot application from handling 50K to 1M requests per second. I’ll share the…