...

How to Speed Up Terraform in CI/CD Pipelines

Nikolay Ninov
January 20, 2023
Read: 4 min

In this series of blog posts, we are going to offer various tips and tricks to speed up Terraform in CI/CD pipelines.
In the first part of the series, we are taking a look at where Terraform providers are installed locally. We are then going to use what we have learnt to optimise a Terraform automation pipeline.

Following the steps in this article requires basic Terraform and GitLab CI knowledge.

What is a Terraform provider?

Let’s start at the very beginning and quickly go through what a Terraform provider is.

Simply put, a Terraform provider is a binary written in Go that manages the interaction between Terraform and service APIs. Such services can be cloud providers, SaaS platforms and all kinds of other APIs.

For example, let’s say we want to create an S3 bucket in AWS using Terraform. In order to do that, we need to install the official AWS provider, after which Terraform is going to know how to make the correct API calls to AWS so that it can create the S3 bucket.

The Terraform language is declarative, which means we need to describe which providers we want and set the proper configuration, and during initialisation, Terraform will download and install them.

Take a look at this code snipped from a provider block:

 

Let’s take this piece of code and put it into a main.tf file.

Now, when we run the terraform init command, we can see Terraform downloading and installing the AWS provider:

 

After initialisation, Terraform has created a hidden file and a dir next to the main.tf file.

We will first explore the directory.

 

As you can see, this is where Terraform has installed the AWS provider.

Next let’s quickly see what the .terraform.lock.hcl file is. Terraform even described that during terraform init “Terraform has created a lock file .terraform.lock.hcl to record the provider selections it made above. Include this file in your version control repository so that Terraform can guarantee to make the same selections by default when you run “terraform init” in the future.”

In that file, we have all the installed providers with their versions together with some SHA256 checksum hashes. When we have that file and we execute terraform init, Terraform will always install providers with their versions from the file, even if there is a newer version available.

Terraform cache

There are some disadvantages to having Terraform providers installed inside each project folder. For example, if there is a project that utilises multiple folders with the same Terraform providers, the same provider would have to be downloaded in every folder. That unnecessarily takes up space, time and bandwidth.

Another issue is that if we had a Terraform automation pipeline, we would not want to push the providers’ binaries into the version control system. This means that the pipeline will be downloading the providers on each terraform init.

To address that, we can use terraform caching. To enable caching, you need to create the following environment variable with the value where you want the providers to be installed:

TF_PLUGIN_CACHE_DIR="$HOME/.terraform.d/plugin-cache

Let’s see what will happen when we export that environment variable, delete the .terraform folder and execute terraform init again:

 

We can see that we no longer download the provider but instead use it from a shared cache directory. And in the .terraform folder we now have a symlink to the cache directory:

 

Terraform caching in Gitlab CI

Finally, using the knowledge we have, let’s incorporate caching into a Gitlab CI pipeline because one of the first things you can do to improve your pipelines is to make use of caching.

First thing’s first though: let’s start by establishing a baseline: how long does it take for a terraform init without caching to complete.

Let’s use a baseline project where we have multiple providers. You can see it takes 37 seconds to complete terraform init without caching:

 

Now let’s enable caching and repeat the same command to observe the command:

 

Whoa, look at that! The terraform init execution time has been reduced tenfold to mere 3 seconds.

Let’s go ahead and implement the caching in an actual GitLab pipeline. Gitlab has a cache:key:files keyword that is perfect to use with lock files, so it will use the same cache until there is no change to the lock file.
To make caching work on a GitLab CI pipeline, we need to set a global TF_PLUGIN_CACHE_DIR variable and use the GitLab cache:key:files keyword.

The code below only demonstrates the caching and does not constitute a complete pipeline:

 

Here is the output of the pipeline:

 

In a nutshell

Terraform uses providers to communicate with APIs. These providers are installed in a hidden directory called .terraform for each separate Terraform project.
There is a lock.hcl file that makes sure we lock the provider versions we want to install.

We can take advantage of Terraform caching to reduce the execution time of our pipelines drastically thus improving the performance of our CI/CD process.

 

Stay tuned for the next part of this blog series where we are going to discuss how we optimise the handling of terraform plans in CI/CD pipelines.

Meanwhile, browse the Infinite Lambda blog for more DevOps and DataOps insights.

More on the topic

Everything we know, we are happy to share. Head to the blog to see how we leverage the tech.

Data Validation After Refactoring in Snowflake
Data Validation After Refactoring in Snowflake
Oh, well. Your current model is working as expected. The resulting table or query for reporting has good quality data that has been already validated...
January 24, 2023
Infinite Lambda is now a Snowflake Elite Services Partner
Infinite Lambda Named Snowflake Elite Services Partner
Infinite Lambda has been accredited with Snowflake Elite Services Partnership. We are beyond thrilled to share the news as this recognition attests to our extensive...
December 8, 2022
How to Provide Platform-Specific Interface for Kotlin Multiplatform Code
This blog post looks at interfacing with platform-specific code using Kotlin Multiplatform. This is Part 2 of a series, so make sure to check out...
December 2, 2022
event-driven architectures for data-driven apps
Make Data-Driven Apps with Event-Driven Architectures
The rise of cloud computing and cloud-native technologies enabled the emergence of new age companies. A digital-native breed of businesses that truly operate 24/7 across...
November 23, 2022
Fivetran Regional Innovation Partner of the Year for EMEA 2022
Infinite Lambda Named Fivetran Regional Innovation Partner of the Year for EMEA
We are thrilled to announce that we have been named Fivetran Regional Innovation Partner of the Year for EMEA. We are twice as happy to...
October 20, 2022
dbt Labs Platinum Partnership and Certification Award
Infinite Lambda Named dbt Labs Platinum Partner
We are thrilled to announce that Infinite Lambda has been named a Platinum partner to dbt Labs. We have been using dbt since the very...
October 18, 2022

Everything we know, we are happy to share. Head to the blog to see how we leverage the tech.