...

How to Speed Up Terraform in CI/CD Pipelines

Nikolay Ninov
January 20, 2023
Read: 3 min

In this series of blog posts, we are going to offer various tips and tricks to speed up Terraform in CI/CD pipelines.
In the first part of the series, we are taking a look at where Terraform providers are installed locally. We are then going to use what we have learnt to optimise a Terraform automation pipeline.

Following the steps in this article requires basic Terraform and GitLab CI knowledge.

What is a Terraform provider?

Let’s start at the very beginning and quickly go through what a Terraform provider is.

Simply put, a Terraform provider is a binary written in Go that manages the interaction between Terraform and service APIs. Such services can be cloud providers, SaaS platforms and all kinds of other APIs.

For example, let's say we want to create an S3 bucket in AWS using Terraform. In order to do that, we need to install the official AWS provider, after which Terraform is going to know how to make the correct API calls to AWS so that it can create the S3 bucket.

The Terraform language is declarative, which means we need to describe which providers we want and set the proper configuration, and during initialisation, Terraform will download and install them.

Have a look at this code snippet from a provider block:

 

Let’s take this piece of code and put it into a main.tf file.
Now, when we run the terraform init command, we can see Terraform downloading and installing the AWS provider:

 

After initialisation, Terraform has created a hidden file and a dir next to the main.tf file.

We will first explore the directory.

 

As you can see, this is where terraform has installed the AWS provider.

Next let’s quickly see what the .terraform.lock.hcl file is. Terraform even described that during terraform init “Terraform has created a lock file .terraform.lock.hcl to record the provider selections it made above. Include this file in your version control repository so that Terraform can guarantee to make the same selections by default when you run "terraform init" in the future.”

In that file, we have all the installed providers with their versions together with some SHA256 checksum hashes. When we have that file and we execute terraform init, Terraform will always install providers with their versions from the file, even if there is a newer version available.

Terraform cache

There are some disadvantages to having Terraform providers installed inside each project folder. For example, if there is a project that utilises multiple folders with the same Terraform providers, the same provider would have to be downloaded in every folder. That unnecessarily takes up space, time and bandwidth.

Another issue is that if we had a Terraform automation pipeline, we would not want to push the providers’ binaries into the version control system. This means that the pipeline will be downloading the providers on each terraform init.

To address that, we can use terraform caching. To enable caching, you need to create the following environment variable with the value where you want the providers to be installed:

TF_PLUGIN_CACHE_DIR="$HOME/.terraform.d/plugin-cache

Let’s see what will happen when we export that environment variable, delete the .terraform folder and execute terraform init again:

 

We can see that we no longer download the provider but instead use it from a shared cache directory. And in the .terraform folder we now have a symlink to the cache directory:

 

Terraform caching in Gitlab CI

Finally, using the knowledge we have, let’s incorporate caching into a Gitlab CI pipeline because one of the first things you can do to improve your pipelines is to make use of caching.

First thing’s first though: let’s start by establishing a baseline: how long does it take for a terraform init without caching to complete.

Let’s use a baseline project where we have multiple providers. You can see it takes 37 seconds to complete terraform init without caching:

 

Now let’s enable caching and repeat the same command to observe the command:

 

Whoa, look at that! The terraform init execution time has been reduced tenfold to mere 3 seconds.

Let’s go ahead and implement the caching in an actual GitLab pipeline. Gitlab has a cache:key:files keyword that is perfect to use with lock files, so it will use the same cache until there is no change to the lock file.
To make caching work on a GitLab CI pipeline, we need to set a global TF_PLUGIN_CACHE_DIR variable and use the GitLab cache:key:files keyword.

The code below only demonstrates the caching and does not constitute a complete pipeline:

 

Here is the output of the pipeline:

 

In a nutshell

Terraform uses providers to communicate with APIs. These providers are installed in a hidden directory called .terraform for each separate Terraform project.
There is a lock.hcl file that makes sure we lock the provider versions we want to install.

We can take advantage of Terraform caching to reduce the execution time of our pipelines drastically thus improving the performance of our CI/CD process.

 

Stay tuned for the next part of this blog series where we are going to discuss how we optimise the handling of terraform plans in CI/CD pipelines.

Meanwhile, browse the Infinite Lambda blog for more DevOps and DataOps insights.

More on the topic

Everything we know, we are happy to share. Head to the blog to see how we leverage the tech.

Data diff validation in a blue green deployment: how to guide
Data Diff Validation in Blue-Green Deployments
During a blue-green deployment, there are discrepancies between environments that we need to address to ensure data integrity. This calls for an effective data diff...
January 31, 2024
GDPR & Data Governance in Tech
GDPR & Data Governance in Tech
The increasing focus on data protection and privacy in the digital age is a response to the rapid advancements in technology and the widespread collection,...
January 18, 2024
Data masking on Snowflake using data contracts
Automated Data Masking on Snowflake Using Data Contracts
As digital data is growing exponentially, safeguarding sensitive information is more important than ever. Compliance with strict regulatory frameworks, such as the European Union’s General...
January 17, 2024
What AI is not: demystifying LLMs
Demystifying LLMs: What AI Is Not
Just a year ago, hardly anyone had heard of large language models (LLMs), the technology behind ChatGPT. Now, these models are everywhere, revolutionising the way...
January 11, 2024
Digital innovations in Ukraine
Top 9 Digital Innovations in Ukraine
Attention. Air raid alert. Proceed to the nearest shelter. Don’t be careless. Your overconfidence is your weakness. – Air raid alert app voice-over using the...
January 4, 2024
Doing business in Ukraine
Business in Ukraine: Recommendations for the Tech Sector
In December 2023, I made my first trip to Ukraine since Russia’s full scale invasion of the country in February 2022. It was, for all...
January 3, 2024

Everything we know, we are happy to share. Head to the blog to see how we leverage the tech.