The GitLab Continuous Integration and Continuous Delivery suite helps software teams to collaborate better and frequently deploy small manageable chunks of code into production environments. My company at that time was using a self-hosted, on-premises version of GitLab Continuous Integration and Continuous Delivery.
There are three main architectural components of GitLab Continuous Integration and Continuous Delivery that are especially relevant to this discussion:
- Control plane: This isthe layer that interacts with the end user, so APIs, web portals, and so on all fall into this category. This was owned and managed by the central infrastructure and operations team.
- Runners: Runners are the compute environments used by GitLab to run/execute the pipeline stages and the respective processes. As soon as developers commit code to their repository, a pipeline triggers and executes the pipeline stages in sequence by leveraging these compute resources. Due to the heterogeneous project requirements, each team owned and operated their own runners. Based on the technology stacks they worked with, they could decide which type of compute resources would best fit their needs. As a fallback mechanism, there was also a shared pool of GitLab runners, which could be used by teams. However, as you can imagine, these were not very reliable in terms of availability and spiky workloads. For example, if you need two cores of CPU and 1 GB RAM for your Java build immediately, the release of an urgent patch to production could be a challenge. Therefore, it was generally recommended to begin with these but to switch to custom-built, self-managed runners when needed.
- Pipelines: Lastly, if you have used Jenkins or AWS CodePipeline in the past, GitLab Pipelines are similar in terms of functionality. You can define different phases of your software delivery process in a YAML file, commit it alongside your code, and let GitLab manage your software delivery from there on.
At that time, I was supporting five to six software projects for the developers of my team. Having started with the shared pool of GitLab runners hosted on-premises, we were able to leverage the compute resources for our needs, for roughly 80-120 builds per day. However, with increasing adoption throughout the company, the resources on these runners would frequently become exhausted, leading to several pipeline processes waiting for execution. Additionally, occasional VM failures meant that all software delivery processes dependent on these shared resources across the company would come to a halt. This was certainly not a good situation to be in. The central ops team added more resources to this shared pool, but this was still a static server farm, whereby my team’s build jobs were dependent on how others used these resources.
Having learned from the issues relating to unscalable on-premises infrastructure during the database setup, as discussed previously, I decided to leverage AWS cloud capabilities this time. Discussions with the developers (customers) led to the definition of the following requirements, which were all fulfilled with AWS services out of the box:
- Flexibility to scale infrastructure up/down
- Usage monitoring for the runners in AWS
- Less operational work
Across the entire solution design, the only effort required from my side was the code to register/ de-register these runners with the control plane when the compute instances were started or stopped.
The final design (see Figure 1.4) leveraged auto -scaling groups in AWS, which is a mechanism to dynamically scale up or trim down the compute resources depending on the usage patterns:
Figure 1.4: GitLab Continuous Integration and Continuous Delivery with runners hosted in AWS
As soon as the new servers were started, they registered themselves with the GitLab control plane.