This article is part 3 of a 3 part series of my job interview experience as a DevOps Engineer.
The last task dealt with the topic of resource limits and scaling strategies. I had to think about the factors I would use to determine the scaling strategy. I wasn’t ready to know about resource limits or scaling strategies within my Kubernetes course, so I fired up my search engine of choice DuckDuckGo and came across how resource limits and auto scaling works in Kubernetes.
Resources can be easily specified in the specification of a pod. To do this, one adds a resource object to the respective container, as follows:
... containers: - name: exercise-app image: niklasmtj/exercise-app:v1 resources: limits: memory: "128Mi" cpu: "500m"
Here you can see that the container may use a maximum of 128 MB Ram as well as half of a CPU.
500m stands for 500 millicpu. You can read more about this in the Kubernetes documentation: Managing Resources for Containers.
As results to my search I found information about Horizontal Pod Autoscaler (HPA). I never heard of these through my course, as they weren’t explained until the end of Part 3, and I only finished that a few days after these assignments. However, HPAs seemed perfect for me to use for this, as Kubernetes does the auto-scaling of the pods. Below is my decision on the utility of HPAs, as well as the metrics by which the HPA scales the pods.
The Pods will be scaled horizontally via Kubernetes’ own Horizontal Pod Autoscaler (HPA). Checking the resources is a good indicator of how much an application has to handle at that moment. Because of that the auto scaling begins when the CPU usage of the Pod exceeds 50%. This will trigger Kubernetes’ HPA which then creates a new Pod to handle the traffic. There are also Vertical Pod Autoscaler (VPA) that will increase the given resources to scale the Pod vertically. In my opinion checking the number of requests as a scaling strategy can be possible. BUT: applications use resources very differently. A web application that only serves a simple static file or only text like the used NodeJS app that prints Hello World to the page can handle a lot more requests than a more complex one which has more side effects (e.g. getting data from an remote API). If a developer is really sure about the correlation of number of requests and resouce usage then it would be a good idea to use it as a scaling strategy.
To create a HPA with those 50% cpu restriction, a minimum of 1 Pod and a maximum of 10 pods one can use:
kubectl autoscale deployment exercise-app --namespace exercise-ns --cpu-percent=50 --min=1 --max=10 --dry-run=client -o yaml > manifests/hpa.yaml
Which will outputs the desired HPA definition to a
yaml file in the manifests directory.
The file will then look like this:
apiVersion: autoscaling/v1 kind: HorizontalPodAutoscaler metadata: creationTimestamp: null name: exercise-app namespace: exercise-ns spec: maxReplicas: 10 minReplicas: 1 scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: exercise-app targetCPUUtilizationPercentage: 50 status: currentReplicas: 0 desiredReplicas: 0
To test the scaling effects one could use Bombardier which creates multiple HTTP connections to check the response of a service. The test can be done with a temporary Docker container which can be run with:
docker run --ti --rm alpine/bombardier -c 500 -d 120s -l http://192.168.178.41:8081
http://192.168.178.41:8081 is in this example my local IP address where the app is accessible. The flags
-c 500 -d 120s tells Bombardier to create 500 concurrent connections for 2 minutes.
To create a Kubernetes Pod which will be deleted after its’ job one can use:
kubectl run bombardier --image=alpine/bombardier --rm -it --restart=Never --command -- bombardier -c 500 -d 120s -l http://exercise-svc.exercise-ns
This will do the same but calls the
exercise service in it’s
After a while starting Bombardier
kubectl get hpa -n exercise-ns will show:
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE rendertron-hpa Deployment/rendertron 230%/50% 1 10 5 6m36s
This shows that 5 Replicas of the Pod definition are running because the CPU is at 230%. The Horizontal Autoscaler works because it dynamically created 4 additional Pods to manage the incoming traffic.
This was my experience and reflections on my first practical task on Kubernetes for a job. Despite the problems at the beginning, the tasks were fairly given and were quite solvable. Towards the end, it was really a lot of fun to solve the tasks and learn new parts of Kubernetes. Kubernetes takes so much of the configuration work out of your hands. Especially in the area of autoscaling, you as a developer or operations team have to worry much less about whether the upcoming traffic can be handled by the systems. I am very glad to have started with the DevOps with Kubernetes course shortly before. It made me much more confident in setting up the cluster and at least I made quick progress there. Especially after the initial complications, it gave me more confidence. I definitely want to get further into Kubernetes and learn more in the DevOps area. This area still seems to offer so much opportunity to make developers and operations teams’ jobs easier.
The code from this post can also be found on GitHub: niklasmtj/kubernetes-exercise.
Additionally, I created an
arm64 as well as an
amd64 docker image for
niklasmtj/exercise-app:v1. So the example app should be usable on other devices as well.