In our recent post, we have discussed about Kubernetes autoscaling and methods of autoscaling with benefits and best practices. In this article, we are going to learn about one of Autoscaling method called Horizontal Pod Autoscaler. This method has own advantage as like other methods, and this is widely adopted method also, as per our understanding. Let’s see in detail about this method.
What Is Horizontal Pod Autoscaler (HPA)?
A Kubernetes cluster is made up of one or more virtual machines called nodes. In Kubernetes, a pod is the smallest resource in the hierarchy and your application containers are deployed as pods. A pod is a logical construct in Kubernetes and requires a node to run, and a node can have one or more pods running inside of it.
Horizontal Pod Autoscaler is a type of autoscaler that can increase or decrease the number of pods in a Deployment, ReplicationController, StatefulSet, or ReplicaSet, usually in response to CPU utilization patterns. This process represents horizontal scaling because it changes the number of instances, not the resources allocated to a given container.
How Does HPA Work and What Are Its Benefits?
By default, HPA scales workloads based on pod metrics like average CPU/memory utilization and average pod utilization. It is also possible to use externally provided or custom metrics. After the initial setup, it can operate automatically — you only need to define the minimum and the maximum number of replicas, as per your requirements or demand.
The configured HPA controller is responsible for checking metrics and scaling replicas accordingly by adding or removing pods. This scaling occurs automatically, but you can sometimes account for predictable fluctuations in loading requirements. HPA works in a loop by checking, updating, and re-checking metrics.
In the first step of the HPA loop, the controller continuously tracks resource (like CPU, Memory and other custom) utilization via the metrics server. Next, HPA calculates the optimal number of replicas based on the resource requirements. Then, the autoscaler decides whether to scale the application up or down. In the last step of the loop, HPA implements the target number of replicas.
As HPA is a continuous monitoring process, so this loop repeats as soon as it finishes. The default interval for HPA checks is 30 seconds. Use the --horizontal-pod-autoscaler-sync-period
controller manager flag to change the interval value.
The autoscaling/v1
API version of the HPA only supports the average CPU utilization metric. The autoscaling/v2
API version allows scaling according to memory usage, defining custom metrics, and using multiple metrics in a single HPA object.
What Is the Impact of HPA on Kubernetes Resource Costs?
Running multiple workloads on a server instance can be cost-effective but tracking your Kubernetes costs and identifying where you can save is challenging. Autoscaling lets you tightly configure scaling to reduce waste and minimize application running costs.
Application usage often changes over time, requiring more or fewer pod replicas. HPA scales your workloads automatically. It is useful for stateless and stateful applications. Combining HPA with cluster scaling helps reduce costs for workloads with frequent demand changes, decreasing the number of nodes alongside the pods.
Properly configured, the HPA controller can monitor pods to determine if the number of replicas needs changing. It compares the current value to the target value.
How to Use HPA Metrics
As discussed above, the Horizontal Pod Autoscaler (HPA) enables horizontal scaling of container workloads running in Kubernetes. In order for HPA to work, the Kubernetes cluster needs to have metrics enabled. See how to enable metrics in the Kubernetes metrics server tool.
Kubernetes HPA supports four kinds of metrics:
Resource Metric
Resource metrics refer to CPU and memory utilization of Kubernetes pods against the values provided in the limits and requests of the pod spec. These metrics are natively known to Kubernetes through the metrics server. The values are averaged together before comparing them with the target values. That is, if three replicas are running for your application, the utilization values will be averaged and compared against the CPU and memory requests defined in your deployment spec.
Object Metric
Object metrics describe the information available in a single Kubernetes resource. An example of this would be hits per second for an ingress object.
Pod Metric
Pod metrics (referred to as PodsMetricSource) references pod-based metric information at runtime and can be collected in Kubernetes. An example would be transactions processed per second in a pod. If there are multiple pods for a given PodsMetricSource, the values will be collected and averaged together before being compared against the target threshold values.
External Metrics
External metrics are metrics gathered from sources running outside the scope of a Kubernetes cluster. For example, metrics from Prometheus can be queried for the length of a queue in a cloud messaging service, or QPS from a load balancer running outside of the cluster.
Horizontal Pod Autoscaler API Versions
API version autoscaling/v1 is the stable and default version; this version of API only supports CPU utilization-based autoscaling.
autoscaling/v2 version of the API brings usage of multiple metrics, custom and external metrics support.
You can verify which API versions are supported on your cluster by querying the api-versions. This command lists, all the versions.
# kubectl api-versions | grep autoscaling
autoscaling/v1
autoscaling/v2
autoscaling/v2beta1
autoscaling/v2beta2
Requirements
Horizontal Pod Autoscaler (and also Vertical Pod Autoscaler) requires a Metrics Server installed in the Kubernetes cluster. Metric Server is a container resource metrics (such as memory and CPU usage) source that is scalable, can be configured for high availability, and is efficient on resource usage when operating. Metrics Server gather metrics -by default- every 15 seconds from Kubelets, this allows rapid autoscaling,
You can easily check if the metric server is installed or not by issuing the following command:
# kubectl top pods
The following message will be shown if the metrics server is not installed.
error: Metrics API not available
On the other hand, if the Metric Server is installed, you should get appropriate output with resource utilization.
Installation of Metrics Server
If you have already installed Metrics Server, you can skip this section.
Metrics Server offers two easy installation mechanisms; one is using kubectl
that includes all the manifests.
# kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
The second option is using the Helm chart, which is preferred. Helm values can be found here.
First, add the Metrics-Server Helm repository to your local repository list as follows.
# helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server/
Now you can install the Metrics Server via Helm.
# helm upgrade --install metrics-server metrics-server/metrics-server
If you have a self-signed certificate, you should add --set args={--kubelet-insecure-tls}
to the command above.
Verifying the Installation
As the installation is finished and we allow some time for the Metrics Server to get ready, let’s try the command again.
# kubectl top pods -n argocd
NAME CPU(cores) MEMORY(bytes)
argocd-application-controller-0 4m 24Mi
argocd-applicationset-controller-596ddc6c7d-d7lgl 4m 29Mi
argocd-dex-server-78c894df5b-svc87 14m 28Mi
argocd-notifications-controller-6f65c4ccdb-5cpb8 3m 22Mi
argocd-redis-ha-haproxy-787f9b5689-rpn62 6m 71Mi
argocd-redis-ha-server-0 13m 20Mi
argocd-repo-server-75b7c59bfb-cqtbz 15m 26Mi
argocd-server-d86d7959d-sd98v 16m 31Mi
Also, we can see the resources of the nodes with a similar command.
# kubectl top nodes
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
aks-agentpool-12792500-vmss000002 125m 3% 1233Mi 9%
You can also send queries directly to the Metric Server via kubectl
.
# kubectl get --raw /apis/metrics.k8s.io/v1beta1/nodes | jq
We can also verify our pod’s metrics from the API.
# kubectl get --raw /apis/metrics.k8s.io/v1beta1/namespaces/argocd/pods/argocd-server-d86d7959d-sd98v | jq
This is either the Metric Server control loop that hasn’t run yet, is not running correctly, or resource requests are not set on the target pod spec.
How to Configure Horizontal Pod Autoscaling?
As an illustration of the horizontal pod autoscaling capabilities, this article will show you how to:
- Create a test deployment.
- Create an HPA via the command line or use the declarative approach.
- Apply custom metrics.
- Apply multiple metrics.
Continue reading on https://foxutech.com/horizontal-pod-autoscaler-know-everything-about-it/