Running a pipeline in a monorepo can be very resource- and time-consuming.
In version control systems, a monorepo (monolith != monorepo) is a software development strategy where we use the same repository to store code for more than one project. Google, Facebook, Microsoft and other big tech companies all employ monorepos. At first sight you might think that this setup has many disadvantages, such as a large number of commits, branches and tracked files or a lack of access control. However, it also has numerous advantages too, including:
- visibility
- easier package management
- consistency
As projects get more complex, the length of the .gitlab-ci.yml increases in proportion to the content of their repository. Besides the conceptual challenges, numerous performance issues can affect a monorepo setup. Don’t let the pipeline be one of them.
Structure and architecture
Automation makes our life easier so let’s use it to generate a configuration file and its content for a child pipeline. At an abstract level, a child pipeline is an automated manifestation of generated processes that execute jobs in a direct acyclic way.
A common practice is to use a for loop to retrieve data over iteration, and save results in an array. In the following example, the concept of the dynamic pipeline generation relies on that after an iteration, elements of an array can be used to create the jobs defined in the generated pipeline.
N.B. I use Python in the examples to replace variables with values during the child-pipeline generation process but feel free to use any other preferred programming language.
Let’s have a look at the directory structure:
[php]libs
└── java
└── auth
└── payments
└── subscription
[/php]
Since it is an explanatory project, it will not require lots of data; we will only need it to demonstrate the concept.
First, separate the generation and the trigger processes into two different jobs as shown in the .gitlab-ci.yml below. The trigger job should use strategy: depend on the generator because the child-pipeline-trigger will need the generated configuration file, which will be available only after the generation process succeeded and is saved as an artifact.
But what’s happening under the hood?
A brief description: The generator job at the child-pipeline-generator stage calls main.py in its script block. That executes the generator() method, which will collect data from get_libs() and create the child pipeline’s configurations file, the child-pipeline-gitlab-ci.yml.
get_libs() iterates through the content of lib/java and returns a list made up of the entities contained by libs_path.
The PipelineWriter class contains all the configurations and templates that we need to generate the child pipeline, including hidden (parent) jobs and the child pipeline job template.
As you can see in the example, Gitlab supports extends with multi-level inheritance. The build-{lib}-lib job inherits all the configurations defined in the .basic job.
You can think of these indented multi-line strings returned by each method as different parts of the child pipeline, which can later be put together into a valid configuration file.
Tip: Validate your generated configuration file with CI Lint before deploying it.
What it looks like on the UI
Conclusion
In this article, I showed an example of how automation makes our life easier and increases efficiency by relying on GitLab’s child-pipeline processes. I used Python to dynamically generate a configuration file for the child pipeline and, in my opinion, this is a great way to create a more organised YAML file with less repetition.
If you dig into optimisation, you can check beforehand which services/libs, packages or directories have changed (between two build processes) and generate a pipeline determined by them to reduce build time and resource usage.
Additional Resources:
- Monorepos in Git | Atlassian Git Tutorial
- Parent-child pipelines | GitLab
- Keyword reference for the `.gitlab-ci.yml` file | GitLab
Head to the Infinite Lambda blog to find other insightful materials.