Building a resilient deployment on Kubernetes-part 2: Resources and capacity planning

7 min readDec 31, 2020

In my previous article on Building a resilient deployment on Kubernetes-part 1: Minimum application downtime, I discussed how to keep instance healthy and maintain high availability.

In this article, I would talk about how the resource utilization and capacity planning in Kubernetes helps to build a resilient deployment.

(The gist of what I discussed in this article can be found in the summary section for a quick look)

General brief on Resources and Capacity Planning

Resource Planning

Any application needs a certain amount of minimum resources such as CPUs and memory, in order to work properly. Hence we need to make sure the application has the required resources in the infrastructure that the application is getting deployed on.

But we need to make sure the application should not overuse resources as well.

Capacity Planning

Once the application deployed on the cluster, it has a certain capability to handle the traffic. As an example, you may need 5instances of an application at peak time. In a real-world scenario, most of the time, this traffic might vary and having a predefined number of instances may not be ideal and it would lead the resources to be underutilized or resources to be overused. Hence we need to be able to scale out (expand) or scale in(shrinking) the application according to the traffic. This is also known as autoscaling.

Resources planning in Kubernetes

There are two levels that you can allocate memory.

At the Container level, we can configure the resource allocation required by each container
For each namespace, we can set a default resources allocation, if the resource allocation is not defined at the container level.

Container level resources allocation

A pod is the smallest executable unit in Kubernetes which consists of one or more containers. For each of these containers, you can define the minimum resource needed. This is known as the resource request. Resource request will ensure that the pod will be scheduled in a node with the requested minimum resources. If there is no node that satisfies the resource request, the pod will not be scheduled.

Similar to the resource request, at each container level, you can define the maximum allowed resource allocation for each container. This is known as the resource limit. The resource limit ensures that the pod will not overuse the resources.

Sample usage can be seen below.

apiVersion: v1
kind: Pod
metadata:
  name: nginx
spec:
  containers:
  - name: app
    image: nginx
    resources:
      requests:
        memory: "256Mi"
        cpu: "500m"
      limits:
        memory: "512Mi"
        cpu: "1000m"

According to the above Pod definition, in order to schedule this pod and to make the nginx application works, it needs at least 256Mi of memory and 500m CPU resources. When the application is running, it can consume only up to 512Mi of memory and 1000m CPU resources.

A combination of both resource request and resource limit will ensure that the pod will have the minimum required resources and it will overuse the resources unnecessarily.

Namespace level resource allocation

You can define a default resource request quota and default resource limit quota in each namespace level using LimitRange. This will ensure that the pod will be subjected to the defined resource allocation if the resource allocation is not defined at the pod level.

An example configuration is shown below.

apiVersion: v1
kind: LimitRange
metadata:
  name: limit-foo
  namespace: foo
spec:
  limits:
  - default:
      memory: 1Gi
      cpu: 1000m
    defaultRequest:
      memory: 500Mi
      cpu: 500m
    type: Container

This LimitRange will be deployed in the foo namespace and it will ensure that each container without specific resource allocation in the foo namespace will be allocated minimum of 500Mi memory and 500m CPU while the maximum resource allocation being 1Gi memory and 1000m CPU.

Usually, these resource allocations will be defined by a Kubernetes application developer. What if he/she defined resources unnecessarily in the container specification or define resources extremely low. How can we make sure if the resources are allocated at the container level, what could be the maximum and minimum limits?

This is where min and max properties in LimitRange come in handy. These properties will make sure that the pod will not be scheduled if the pods are allocated unnecessarily high or low resources.

An extension of the above example is shown below.

apiVersion: v1
kind: LimitRange
metadata:
  name: limit-foo
  namespace: foo
spec:
  limits:
  - default:
      memory: 750Mi
      cpu: 750m
    defaultRequest:
      memory: 500Mi
      cpu: 500m
    min:
      memory: 250Mi
      cpu: 250m
    max:
      memory: 1000Mi
      cpu: 1000m
    type: Container

This LimitRange will be deployed in foo namespace. If resources are not specifically allocated in the Pod spec definition for containers, those containers without resource allocation will be assiged the resources requests values and resources limits value based on the default and defaultRequest values in the LimitRange specified above.
If you try to allocate resources for a container more than the values defined under max or less than the values defined under min, those pods will not be scheduled in order to manage resources properly.

A further step to managing resource allocation in a heterogeneous cluster

Kubernetes cluster can be a homogeneous cluster(consists of same spec nodes) or a heterogeneous(consists of different spec nodes). If it’s a heterogeneous cluster, it may contain different types of instances to work on different types of workload.

For example, let’s say the cluster is consists of 3 CPU optimized instances and 3 Memory-optimized instances. Depending on the nature of the application, you want to make sure a certain set of pods will only get placed in a certain type of instances(nodes).

According to the example, the CPU intensive applications should only be scheduled in CPU optimized instances and no other pod should be scheduled in the CPU optimized instances. How can we make sure this?

Kubernetes provides taints and tolerations along with node affinity to schedule the pod strategically to manage the resources optimally. A combination of taints and toleration with node affinity will ensure the pod will be scheduled on a specific node. I have discussed this in detail in one of my previous articles. If you are not aware of this, I suggest you read it as it would help you to brush up your knowledge on pod placement strategies.

Capacity planning in Kubernetes cluster

The traffic/workload to the deployed application might vary from time to time and the applications should have the capacity to handle it.

One of the ways doing capacity planning is to have a pre-defined set of instances to be able to handle the largest traffic. But in this situation, the resources would be underutilized for most of the time. Hence cost-wise, it’s not the best option.

What if you can plan your capacity(the number of instances to handle the traffic) as it varies. This is called the autoscaling.

Kubernetes provides autoscaling for the application deployed in Kubernetes using the Horizontal Pod Autoscalers (known as HPA).

Horizontal Pod Autoscaler automatically scales the number of Pods in a replication controller, deployment, replica set or stateful set based on observed CPU utilization (or, with beta support, on some other, application-provided metrics). (Quoted from official Kubernetes documentation)

In the HPA definition itself, we define the target Kubernetes artefact details (Whether it’s a deployment, statefulset or replication set and the name of the artefact) along with the autoscaling properties.

In the HPA definition, we have to define the metric which the scaling should be subjected to along with the maximum and the minimum number of pods that should be scheduled. The scaling metric could be certain CPU usage, memory usage or another custom metric such as the number of requests etc.

To keep this simple, I will take CPU usage as an example. But to get more knowledge on the other aspects, please refer to the official Kubernetes guide on HPA.

Let’s say, I want my deployment named nginx to be scaled up to 10 pods(max pods)with a minimum of 2 pods(min pods). If the CPU utilization across the nginx pods increases above 60% (scaling condition), a new nginx pod should be created upto 10. Then the HPA would be created using one of the fllowing two methods.

One of the easiest ways to create a simple HPA is using thekubectl autoscale command. Here you have to specify the Kubernetes artefact which you want to autoscale and the minimum and the maximum number of pods along with the scaling condition.

kubectl autoscale deployment nginx --cpu-percent=60 --min=2 --max=10

Refer to the official documentation of kubectl autoscale to find out about the other parameters of the command.

The declarative way (using a definition file) for the same scenario is shown below.

apiVersion: autoscaling/v1
kind: HorizontalPodAutoscaler
metadata:
  name: nginx
spec:
  scaleTargetRef:
    apiVersion: apps/v1
    kind: Deployment
    name: nginx
  minReplicas: 2
  maxReplicas: 10
  targetCPUUtilizationPercentage: 60

The HorizontalPodAutoscaler would scale-out the instances up to maximum 10, to maintain the CPU utilization under 60%. Similarly, it will scale in until 2 pods if it detects that the CPU utilization is low.

Summary

Int this article we discussed resources and capacity planning.

Resources planning helps to make sure that the pods getting scheduled to have required resources in the cluster and to avoid overuse of resources unnecessarily.
Capacity planning helps to maintain the number of pods needs to serve the traffic at a given time.

Resource planning

CPU and Memory usage should be allocated appropriately to the pods for better management of resources.
This can be configured at each container level in Pod definition.
Default resources allocation can be configured using LimitRange at namespaces level to allocate resources if the container level resources allocation is not specified.
LimitRange also helps to impose the maximum and minimum resource allocation for each pod in order to avoid application developers assigning resources unnecessarily.
In a heterogeneous cluster, to schedule the pod on specific nodes to function in the optimum way, you can use taints and tolerations with node affinity.

Capacity planning

Can be done using HPA for replicaSet, deployment or statefulSet.
Have to specify the min, max numbers of pods along with the scaling condition.
kubectl autoscale command or a definition file can be used for this.

So far we discussed how to make the application high available in part 1 and in part 2(this article) we discussed how to properly manage resources and do autoscaling for better capacity planning. In a production-level deployment, these facts are crucial to achieving a resilient deployment. But these are not the only things necessary for a resilient deployment. I will discuss them in my next articles.

I hope you enjoyed the article and I think it will be useful. I will see you in my next article.