...

Dynamic Pipeline Generation on GitLab

Andras Kelle
October 5, 2021
Read: 3 min

Running a pipeline in a monorepo can be very resource- and time-consuming.

In version control systems, a monorepo (monolith != monorepo) is a software development strategy where we use the same repository to store code for more than one project. Google, Facebook, Microsoft and other big tech companies all employ monorepos. At first sight you might think that this setup has many disadvantages, such as a large number of commits, branches and tracked files or a lack of access control. However, it also has numerous advantages too, including:

  • visibility
  • easier package management
  • consistency

As projects get more complex, the length of the .gitlab-ci.yml increases in proportion to the content of their repository. Besides the conceptual challenges, numerous performance issues can affect a monorepo setup. Don’t let the pipeline be one of them.

Inside of a pipe

 

Structure and architecture

Automation makes our life easier so let’s use it to generate a configuration file and its content for a child pipeline. At an abstract level, a child pipeline is an automated manifestation of generated processes that execute jobs in a direct acyclic way.

A common practice is to use a for loop to retrieve data over iteration, and save results in an array. In the following example, the concept of the dynamic pipeline generation relies on that after an iteration, elements of an array can be used to create the jobs defined in the generated pipeline.

N.B. I use Python in the examples to replace variables with values during the child-pipeline generation process but feel free to use any other preferred programming language.

Let’s have a look at the directory structure:

 

[php]libs

└── java

└── auth

└── payments

└── subscription

[/php]

 

Since it is an explanatory project, it will not require lots of data; we will only need it to demonstrate the concept.

First, separate the generation and the trigger processes into two different jobs as shown in the .gitlab-ci.yml below. The trigger job should use strategy: depend on the generator because the child-pipeline-trigger will need the generated configuration file, which will be available only after the generation process succeeded and is saved as an artifact.

 

But what’s happening under the hood?

A brief description: The generator job at the child-pipeline-generator stage calls main.py in its script block. That executes the generator() method, which will collect data from get_libs() and create the child pipeline’s configurations file, the child-pipeline-gitlab-ci.yml.

get_libs() iterates through the content of lib/java and returns a list made up of the entities contained by libs_path.

 

The PipelineWriter class contains all the configurations and templates that we need to generate the child pipeline, including hidden (parent) jobs and the child pipeline job template.

As you can see in the example, Gitlab supports extends with multi-level inheritance. The build-{lib}-lib job inherits all the configurations defined in the .basic job.

You can think of these indented multi-line strings returned by each method as different parts of the child pipeline, which can later be put together into a valid configuration file.

 

Tip: Validate your generated configuration file with CI Lint before deploying it.

 

What it looks like on the UI

Dynamically generated jobs
Dynamically generated jobs

Conclusion

In this article, I showed an example of how automation makes our life easier and increases efficiency by relying on GitLab’s child-pipeline processes. I used Python to dynamically generate a configuration file for the child pipeline and, in my opinion, this is a great way to create a more organised YAML file with less repetition.

If you dig into optimisation, you can check beforehand which services/libs, packages or directories have changed (between two build processes) and generate a pipeline determined by them to reduce build time and resource usage.

Additional Resources:

  1. Monorepos in Git | Atlassian Git Tutorial
  2. Parent-child pipelines | GitLab
  3. Keyword reference for the `.gitlab-ci.yml` file | GitLab

 

Head to the Infinite Lambda blog to find other insightful materials.

More on the topic

Everything we know, we are happy to share. Head to the blog to see how we leverage the tech.

why sustainability analytics
Why Sustainability Analytics
We all like a sunny day. Kicking back in the garden with the shades on, cool drink in hand and hopefully a liberal amount of...
May 8, 2024
Data diff validation in a blue green deployment: how to guide
Data Diff Validation in Blue-Green Deployments
During a blue-green deployment, there are discrepancies between environments that we need to address to ensure data integrity. This calls for an effective data diff...
January 31, 2024
GDPR & Data Governance in Tech
GDPR & Data Governance in Tech
The increasing focus on data protection and privacy in the digital age is a response to the rapid advancements in technology and the widespread collection,...
January 18, 2024
Data masking on Snowflake using data contracts
Automated Data Masking on Snowflake Using Data Contracts
As digital data is growing exponentially, safeguarding sensitive information is more important than ever. Compliance with strict regulatory frameworks, such as the European Union’s General...
January 17, 2024
What AI is not: demystifying LLMs
Demystifying LLMs: What AI Is Not
Just a year ago, hardly anyone had heard of large language models (LLMs), the technology behind ChatGPT. Now, these models are everywhere, revolutionising the way...
January 11, 2024
Digital innovations in Ukraine
Top 9 Digital Innovations in Ukraine
Attention. Air raid alert. Proceed to the nearest shelter. Don’t be careless. Your overconfidence is your weakness. – Air raid alert app voice-over using the...
January 4, 2024

Everything we know, we are happy to share. Head to the blog to see how we leverage the tech.