In this series of blog posts, we are going to offer various tips and tricks to speed up Terraform in CI/CD pipelines.
In the first part of the series, we are taking a look at where Terraform providers are installed locally. We are then going to use what we have learnt to optimise a Terraform automation pipeline.
Following the steps in this article requires basic Terraform and GitLab CI knowledge.
What is a Terraform provider?
Let’s start at the very beginning and quickly go through what a Terraform provider is.
Simply put, a Terraform provider is a binary written in Go that manages the interaction between Terraform and service APIs. Such services can be cloud providers, SaaS platforms and all kinds of other APIs.
For example, let's say we want to create an S3 bucket in AWS using Terraform. In order to do that, we need to install the official AWS provider, after which Terraform is going to know how to make the correct API calls to AWS so that it can create the S3 bucket.
The Terraform language is declarative, which means we need to describe which providers we want and set the proper configuration, and during initialisation, Terraform will download and install them.
Take a look at this code snipped from a provider block:
Let’s take this piece of code and put it into a main.tf file.
Now, when we run the terraform init command, we can see Terraform downloading and installing the AWS provider:
After initialisation, Terraform has created a hidden file and a dir next to the main.tf file.
We will first explore the directory.
As you can see, this is where Terraform has installed the AWS provider.
Next let’s quickly see what the .terraform.lock.hcl file is. Terraform even described that during terraform init “Terraform has created a lock file .terraform.lock.hcl to record the provider selections it made above. Include this file in your version control repository so that Terraform can guarantee to make the same selections by default when you run "terraform init" in the future.”
In that file, we have all the installed providers with their versions together with some SHA256 checksum hashes. When we have that file and we execute terraform init, Terraform will always install providers with their versions from the file, even if there is a newer version available.
There are some disadvantages to having Terraform providers installed inside each project folder. For example, if there is a project that utilises multiple folders with the same Terraform providers, the same provider would have to be downloaded in every folder. That unnecessarily takes up space, time and bandwidth.
Another issue is that if we had a Terraform automation pipeline, we would not want to push the providers’ binaries into the version control system. This means that the pipeline will be downloading the providers on each terraform init.
To address that, we can use terraform caching. To enable caching, you need to create the following environment variable with the value where you want the providers to be installed:
Let’s see what will happen when we export that environment variable, delete the .terraform folder and execute terraform init again:
We can see that we no longer download the provider but instead use it from a shared cache directory. And in the .terraform folder we now have a symlink to the cache directory:
Terraform caching in Gitlab CI
Finally, using the knowledge we have, let’s incorporate caching into a Gitlab CI pipeline because one of the first things you can do to improve your pipelines is to make use of caching.
First thing’s first though: let’s start by establishing a baseline: how long does it take for a terraform init without caching to complete.
Let’s use a baseline project where we have multiple providers. You can see it takes 37 seconds to complete terraform init without caching:
Now let’s enable caching and repeat the same command to observe the command:
Whoa, look at that! The terraform init execution time has been reduced tenfold to mere 3 seconds.
Let’s go ahead and implement the caching in an actual GitLab pipeline. Gitlab has a cache:key:files keyword that is perfect to use with lock files, so it will use the same cache until there is no change to the lock file.
To make caching work on a GitLab CI pipeline, we need to set a global TF_PLUGIN_CACHE_DIR variable and use the GitLab cache:key:files keyword.
The code below only demonstrates the caching and does not constitute a complete pipeline:
Here is the output of the pipeline:
In a nutshell
Terraform uses providers to communicate with APIs. These providers are installed in a hidden directory called .terraform for each separate Terraform project.
There is a lock.hcl file that makes sure we lock the provider versions we want to install.
We can take advantage of Terraform caching to reduce the execution time of our pipelines drastically thus improving the performance of our CI/CD process.
Stay tuned for the next part of this blog series where we are going to discuss how we optimise the handling of terraform plans in CI/CD pipelines.
Meanwhile, browse the Infinite Lambda blog for more DevOps and DataOps insights.