Efficient Jenkins with ECS — Part 1
Why we chose ECS?

This article would be interesting if you want to make your Jenkins efficient. Here I would talk about basically “Why ECS for Jenkins?”. This was the first question and most frequent asked question that I was asked :D. Well..mostly because of the configurations problems we came across with. ( I will cover those details in a separate article explaining how to do the configuration without any problem and will link it here once it is published :) )
I hope you would know what Jenkins is…If not..basically Jenkins helps you to monitor your build status. But not only that..Jenkins is widely being used for CI/CD in software development which is the most significant in software developments. You can start up a Jenkins server and run builds in the same node where Jenkins is running. But this is not scalable when there are many build jobs.
There is a limited number of jobs that could be run on a single machine based on its specs. Builds can be run on bare metal(on the machine itself using its own resources) or in a containerized environment.
We can define number of executors based on machine specs to run builds on bare metal. Then each executor can be used to run a build. As an example, if you define 4 executors in your machine, you can run 4 jobs parallel. When you have multiple products and if those multiple products use the same ports, running multiple jobs would not be compatible on bare metal since port conflicts could be happened.
This is where isolated environments, AKA containers come into play. To mitigate the drawbacks of running multiple jobs on bare metal, containers can be used, so that each build would run on its own environment: no port conflicts.
The best practice is not to run builds on master node as I mentioned at the beginning. Master should be only responsible to run Jenkins and Jenkins related configurations. Master is also responsible for scheduling jobs for slaves. Hence, we can configure a few nodes to act as slaves and ask Jenkins master to schedule jobs on these slaves.
Why Amazon ECS Plugin?
Well….if you can add many slaves, if we can run containers on those machines, then it should be enough…(?)…Why would you need ECS..?
Well..There are multiple plugins available to integrate with Jenkins in order to make builds easier and efficient. For example, there are plugins which support running docker containers as slaves. But after installing the plugin, we will need to setup Docker environments to support this plugin in slaves.
Imagine in Jenkins environment where there are multiple slave machines. If you are going to setup Docker environment on each slave machine it would be a bit or a additional work. Also…if you are using auto scaling, having to set up Docker environment would be troubling?
What if there is way, which would take care of everything related to container management and we just have to configure the plugin.
Amazon ECS plugin is a perfect example for this. This plugin allows to run build jobs in AWS ECS cluster while providing benefits of high performance and high scalability.
For those who haven’t heard about ECS, simply ECS stands for EC2 Container Service which is a container management service based on Docker running on a cluster of EC2 instances. ECS takes care of everything related to container management and we do not have to worry about it anymore.
We can define launch configurations by defining the instance type, the number of instances we want etc and we can use this cluster as Jenkins slave. I am not going to describe about how to create an AWS ECS cluster in this article.
Let me explain what was the problems with existed system and how AWS ECS plugin helped to resolve them.
There were three nodes with 4 executors in each. In-fact, we have configured Docker plugin to run docker containers on these 3 nodes. Which mean we had to setup Docker environments in each node. But we did not have auto scaling. Hence it was okay because we didn’t have to configure all the things again and again.
But resources were being wasted as most of the time, most of the nodes were in idle state. But there comes an occasion, all these nodes are occupied and more jobs are waiting on the queue until get a chance to run their builds. Also we feel like what if we can reduce the build time of jobs.
This was causing us problems. In a fully occupied scenario, if something happened in the midst of the build and it got failed, then we don’t know when this build would got the next chance.
To mitigate all these problems we were searching for a solution. First thing came into mind was auto scaling slave nodes using ECS. Well..that would slave the problems with build queue. But how would we include Docker support for it, which means we had to worry about container management. It seemed to be easy, but it was not.
Then we found AWS ECS, which would take care of everything related to container management. And we can use auto-scale with the cluster so that resource would not be wasted and builds would not be in a queue for a longer period. Well..there it solved most of our problems.
Only thing remained was reduce the build time using the suitable infrastructure or instance type. Well..one of my colleague did a research with our build natures and chose an instant type which reduced the build time(5h 20m to 4h 45m minutes as I remember). Even that is a good start. Likewise you can decide the most suitable instance depends on your build natures. It could be General purpose instance, compute optimized, memory optimized, accelerated computing or storage optimized.
Well..I think I have explained my story here based on the experience I had. I would cover configurations article related with AWS ECS plugin by explaining how to do the configurations and what could be the possible challenges you would come across and how to avoid them. (I will it here once it is published :) )
Well..hope you enjoyed reading it. Cheers…!!