...

Dynamic Pipeline Generation on GitLab

Andras Kelle
October 5, 2021
Read: 3 min

Running a pipeline in a monorepo can be very resource- and time-consuming.

In version control systems, a monorepo (monolith != monorepo) is a software development strategy where we use the same repository to store code for more than one project. Google, Facebook, Microsoft and other big tech companies all employ monorepos. At first sight you might think that this setup has many disadvantages, such as a large number of commits, branches and tracked files or a lack of access control. However, it also has numerous advantages too, including:

  • visibility
  • easier package management
  • consistency

As projects get more complex, the length of the .gitlab-ci.yml increases in proportion to the content of their repository. Besides the conceptual challenges, numerous performance issues can affect a monorepo setup. Don’t let the pipeline be one of them.

Inside of a pipe

 

Structure and architecture

Automation makes our life easier so let’s use it to generate a configuration file and its content for a child pipeline. At an abstract level, a child pipeline is an automated manifestation of generated processes that execute jobs in a direct acyclic way.

A common practice is to use a for loop to retrieve data over iteration, and save results in an array. In the following example, the concept of the dynamic pipeline generation relies on that after an iteration, elements of an array can be used to create the jobs defined in the generated pipeline.

N.B. I use Python in the examples to replace variables with values during the child-pipeline generation process but feel free to use any other preferred programming language.

Let’s have a look at the directory structure:

 

[php]libs

└── java

└── auth

└── payments

└── subscription

[/php]

 

Since it is an explanatory project, it will not require lots of data; we will only need it to demonstrate the concept.

First, separate the generation and the trigger processes into two different jobs as shown in the .gitlab-ci.yml below. The trigger job should use strategy: depend on the generator because the child-pipeline-trigger will need the generated configuration file, which will be available only after the generation process succeeded and is saved as an artifact.

 

But what’s happening under the hood?

A brief description: The generator job at the child-pipeline-generator stage calls main.py in its script block. That executes the generator() method, which will collect data from get_libs() and create the child pipeline’s configurations file, the child-pipeline-gitlab-ci.yml.

get_libs() iterates through the content of lib/java and returns a list made up of the entities contained by libs_path.

 

The PipelineWriter class contains all the configurations and templates that we need to generate the child pipeline, including hidden (parent) jobs and the child pipeline job template.

As you can see in the example, Gitlab supports extends with multi-level inheritance. The build-{lib}-lib job inherits all the configurations defined in the .basic job.

You can think of these indented multi-line strings returned by each method as different parts of the child pipeline, which can later be put together into a valid configuration file.

 

Tip: Validate your generated configuration file with CI Lint before deploying it.

 

What it looks like on the UI

Dynamically generated jobs
Dynamically generated jobs

Conclusion

In this article, I showed an example of how automation makes our life easier and increases efficiency by relying on GitLab’s child-pipeline processes. I used Python to dynamically generate a configuration file for the child pipeline and, in my opinion, this is a great way to create a more organised YAML file with less repetition.

If you dig into optimisation, you can check beforehand which services/libs, packages or directories have changed (between two build processes) and generate a pipeline determined by them to reduce build time and resource usage.

Additional Resources:

  1. Monorepos in Git | Atlassian Git Tutorial
  2. Parent-child pipelines | GitLab
  3. Keyword reference for the `.gitlab-ci.yml` file | GitLab

 

Head to the Infinite Lambda blog to find other insightful materials.

More on the topic

Everything we know, we are happy to share. Head to the blog to see how we leverage the tech.

Multitable SCD2 joins
Multitable SCD2 Joins: How to Unify Historical Changes
In the realm of data management, historical changes are conventionally stored in separate Slowly Changing Dimension Type 2 (SCD2) tables. However, extracting point-in-time insights from...
September 25, 2023
VaultSpeed Sapphire Partner
Infinite Lambda Becomes VaultSpeed Sapphire Partner
We are thrilled to share the exciting news that Infinite Lambda is now recognised as a VaultSpeed Sapphire partner. This development is extremely important to...
August 9, 2023
dbt Labs Premier Consulting Partner
Infinite Lambda Becomes dbt Labs Premier Consulting Partner
We are delighted to announce our new Premier Consulting Partner status with dbt Labs, marking a significant step in our successful collaboration. As two pioneers...
August 8, 2023
data contracts explained for non-tech audiences
Data Contracts for Non-Tech Readers: a Restaurant Analogy
Introducing the data kitchen Data contracts have been steadily gaining in popularity within the data community, providing a structured approach to manage and regulate the...
July 17, 2023
dbt Data Quality Tests Implementation
dbt Data Quality Tests Implementation
Data quality definition In this blog post, I am going to show you how to leverage dbt data quality tests to build robust pipelines that...
June 20, 2023
composability in data platforms
Towards Greater Composability in Data Platforms
This post is based on a talk of the same name that the author delivered at Data Mash #7 – London Edition in March 2023....
June 15, 2023

Everything we know, we are happy to share. Head to the blog to see how we leverage the tech.