Dynamic Pipeline Generation on GitLab

Andras Kelle
October 5, 2021
Read: 3 min

Running a pipeline in a monorepo can be very resource- and time-consuming.

In version control systems, a monorepo (monolith != monorepo) is a software development strategy where we use the same repository to store code for more than one project. Google, Facebook, Microsoft and other big tech companies all employ monorepos. At first sight you might think that this setup has many disadvantages, such as a large number of commits, branches and tracked files or a lack of access control. However, it also has numerous advantages too, including:

  • visibility
  • easier package management
  • consistency

As projects get more complex, the length of the .gitlab-ci.yml increases in proportion to the content of their repository. Besides the conceptual challenges, numerous performance issues can affect a monorepo setup. Don’t let the pipeline be one of them.

Inside of a pipe


Structure and architecture

Automation makes our life easier so let’s use it to generate a configuration file and its content for a child pipeline. At an abstract level, a child pipeline is an automated manifestation of generated processes that execute jobs in a direct acyclic way.

A common practice is to use a for loop to retrieve data over iteration, and save results in an array. In the following example, the concept of the dynamic pipeline generation relies on that after an iteration, elements of an array can be used to create the jobs defined in the generated pipeline.

N.B. I use Python in the examples to replace variables with values during the child-pipeline generation process but feel free to use any other preferred programming language.

Let’s have a look at the directory structure:



└── java

└── auth

└── payments

└── subscription



Since it is an explanatory project, it will not require lots of data; we will only need it to demonstrate the concept.

First, separate the generation and the trigger processes into two different jobs as shown in the .gitlab-ci.yml below. The trigger job should use strategy: depend on the generator because the child-pipeline-trigger will need the generated configuration file, which will be available only after the generation process succeeded and is saved as an artifact.


But what’s happening under the hood?

A brief description: The generator job at the child-pipeline-generator stage calls main.py in its script block. That executes the generator() method, which will collect data from get_libs() and create the child pipeline’s configurations file, the child-pipeline-gitlab-ci.yml.

get_libs() iterates through the content of lib/java and returns a list made up of the entities contained by libs_path.


The PipelineWriter class contains all the configurations and templates that we need to generate the child pipeline, including hidden (parent) jobs and the child pipeline job template.

As you can see in the example, Gitlab supports extends with multi-level inheritance. The build-{lib}-lib job inherits all the configurations defined in the .basic job.

You can think of these indented multi-line strings returned by each method as different parts of the child pipeline, which can later be put together into a valid configuration file.


Tip: Validate your generated configuration file with CI Lint before deploying it.


What it looks like on the UI

Dynamically generated jobs
Dynamically generated jobs


In this article, I showed an example of how automation makes our life easier and increases efficiency by relying on GitLab’s child-pipeline processes. I used Python to dynamically generate a configuration file for the child pipeline and, in my opinion, this is a great way to create a more organised YAML file with less repetition.

If you dig into optimisation, you can check beforehand which services/libs, packages or directories have changed (between two build processes) and generate a pipeline determined by them to reduce build time and resource usage.

Additional Resources:

  1. Monorepos in Git | Atlassian Git Tutorial
  2. Parent-child pipelines | GitLab
  3. Keyword reference for the `.gitlab-ci.yml` file | GitLab


Head to the Infinite Lambda blog to find other insightful materials.

More on the topic

Everything we know, we are happy to share. Head to the blog to see how we leverage the tech.

Apache Airflow start_date and execution_date explained
Airflow start_date and execution_date Explained
Despite Airflow’s popularity in data engineering, the start_date and execution_date concepts remain confusing among many new developers today. This article aims to demystify them. Basic...
June 15, 2022
Breaking Some Myths about the Use of Dual-Track Agile
Bringing both flexibility and transparency, the Dual-Track Agile methodology is increasingly popular. With a growing number of teams that decide to try it out, it...
June 10, 2022
Creating a PostgreSQL to BigQuery Sync Pipeline Using Debezium and Kafka
Many companies today use different database technologies for their application and their data platform. This creates the challenge of enabling analytics on application data without...
June 1, 2022
How to Apply Dual-Track Agile in Practice
This article is a part of a series on the Dual-Track model. Here, I am going to share with you 5 rules on how to...
May 17, 2022
Challenges of Using Dual-Track Agile and How to Handle Them
Welcome to Part II of the Infinite Lambda’s blog series on Dual-Track Agile. You might want to check Part I that explains what this model...
April 15, 2022
Sustainability: the Last Frontier in Business Intelligence
The power of the modern data stack in generating actionable insights out of disparate data is well documented. It’s time to apply this to sustainability....
April 1, 2022

Everything we know, we are happy to share. Head to the blog to see how we leverage the tech.

Optimized by Seraphinite Accelerator
Turns on site high speed to be attractive for people and search engines.