Terraform For Beginners

C:\Dave\Storey
ITNEXT
Published in
17 min readSep 1, 2021

--

Introduction

So, you’ve heard of Terraform, right? but you don’t actually know what it is or how it works? Well, fellow readers strap yourselves in for my beginners' guide.

In this post, I will explain to you the basic concepts of Terraform, how you use it, how it works and enlighten you as to why it has gained such a strong following over the last few years. I plan to lightly touch on the basic syntax of Terraform and the configuration language it uses, without going too in depth in the more advanced topics. I will follow up this blog with a more in-depth look at the technology in the future.

When I started writing this article I imagined it would just be a small simple thing, but as I started I realised that Terraform is actually very complex, and the article grew as a result. So if you don’t have a lot of time and want to just digest the summary, please feel free to skip down to the bottom where you will find the TL;DR version 🙂

What is Terraform and why all the fuss?

Ok, so chances are you have found this blog because someone in your company has said “We should use Terraform for this” or because you’ve come across a git repo that says “You can use these Terraform scripts to deploy this solution” and not wanted to put your hand up and ask what they are talking about. Fear not dear reader, I was definitely in your shoes 12+ months ago and struggled to break free from the confusion.

Terraform in a nutshell

In its most basic form, Terraform is an application that converts configuration files known as HCL (Hashicorp Configuration Language) into real world infrastructure, usually in Cloud providers such as AWS, Azure or Google Cloud Platform.

This concept of taking configuration files and converting them into real resources is known as IaC (Infrastructure as Code) and is the new hotness in the world of Software Engineering. And the reason it is becoming so hot right now, is because this code can live alongside your app code in repos, be version controlled and easily integrated into your CI/CD pipelines.

Code in Resources Out… Simples! (Not Quite)

So as many of you will have probably seen before, the concept of IaC is nothing new, people have been trying to automate their Cloud deployments now for over a decade, so why has this Terraform become so popular?

To put it simply; Terraform is a state driven Cloud Platform provisioning engine. It leverages abstraction tooling (known as providers and backends) to enable us to write code that can be interpreted and translated into consistent, and deterministic, Cloud provider specific CRUD api calls, removing a lot of leg work and stress from us.

High level view of how Terraform works

Infrastructure As Code

Ok so you’ve got the basic elevator pitch, but what does this actually mean though? Simply put IaC tries to fix the following problems:

  • Removes the need to have separate teams for managing infrastructure provisioning and development of application code.
  • Allows us to use modern source control tooling (such as git) to save, author/review changes to our infrastructure and also provide us a good overview of this history/evolution of our infrastructure.
  • Enables us to perform continuous integration and continuous deployment (CI/CD) of application and infrastructure; meaning that both elements can be deployed hand in hand.
  • Allows us to easily keep multiple environments in sync without having to do manual updates.
  • Removes the need for custom infrastructure provisioning scripts and/or tooling built in-house.
  • Removes human error because automated code deployments remove the human element.

It is important to note that IaC is not a new concept, in fact it has been around in different guises since the very early days of Cloud Platforms (AWS had CloudFormation and Azure had ARM Templates)

A simple example

As a system developer, I have realised that I need to have some cloud storage provisioned. Nothing fancy, but I need a place to dump files and a place to read files from. Let’s take a look at what this looks like conceptually when we are working with Azure and “physical” cloud infrastructure:

Conceptual system architecture

So I’m using Microsoft Azure terminology here, but hopefully this isn’t massively confusing for anyone not familiar with this platform. Basically I need a Resource Group (because in Azure things live in Resource Groups), a Storage Account and a Storage Container. With all these things in place I will then be able to safely store my files inside said container in the cloud 🙂

Now let’s convert this conceptual model into Terraform HCL and see what it looks like:

terraform {
required_providers {
azurerm = "~> 2.64"
}
}
provider "azuread" {
}
resource "azurerm_resource_group" "example" {
name = "example-resources"
location = "West Europe"
}

resource "azurerm_storage_account" "example" {
name = "examplestoracc"
resource_group_name = azurerm_resource_group.example.name
location = azurerm_resource_group.example.location
account_tier = "Standard"
account_replication_type = "LRS"
}

resource "azurerm_storage_container" "example" {
name = "content"
storage_account_name = azurerm_storage_account.example.name
container_access_type = "private"
}

And if annotate our conceptual design with the HCL we get:

Breaking it down

Ok so I know I said I wouldn’t go too deep into explaining the syntax of the HCL, but I think a little overview would likely help explain things.

Let’s take our earlier example of HCL for our storage account, and break it down step by step:

terraform {
required_providers {
azurerm = "~> 2.64"
}
backend "azurerm" {
}
}
provider "azuread" {
}
resource "azurerm_resource_group" "example" {
name = "example-resources"
location = "West Europe"
}
resource "azurerm_storage_account" "example" {
name = "examplestoracc"
resource_group_name = azurerm_resource_group.example.name
location = azurerm_resource_group.example.location
account_tier = "Standard"
account_replication_type = "LRS"
}

Looking at this syntax straight away you can easily see some similarities with languages such as JSON. But what does each part (or block) mean?

  • terraform : Allows you to configure some behaviours of Terraform itself, such as which Providers you wish to use and the versions that it needs to download. Remember: you cannot use resources from Providers without first instructing Terraform you want to use those Providers much like nuget package dependencies in our C# code or npm packages in JavaScript. For more information see the docs here
  • backend : Allows us to specify which Backend we want Terraform to use. Backends determine two key things, where our State is stored, and where our operations are executed. Essentially if we are working with Azure and want to persist our State into the cloud too, we can do so using the azurerm backend. If we don’t specify a backend, then the default one is used which is often referred to as local. See the official docs here
  • provider : Providers are plugins that Terraform uses to interface with different cloud providers. Some Providers enable you to add additional configuration in these blocks. See the official docs here
  • resource : A keyword within HCL to indicate that this is a resource you want to have provisioned. Official docs here
  • "azure_reosurce_group" : This is the “type” of the “resource” you want. When it runs, Terraform will try and determine which “Provider” contains this “type” and execute the correct CRUD action upon it. (I will cover these CRUD operations in more detail later in this article)
  • "example" : This is the name/identifier of this resource within the Terraform script. Please do not confuse this with the name of the resource you will see in the Azure Portal. It is best to think of this in the same way you would have variable names in code languages such as Go or C# etc. Notice how I can call both of my resources the same “example” identifier, this is because Terraform allows duplication provided that the combination of resource type and identified are unique.
  • name and location : These are attributes or properties of the “resource”. These determine how the “Provider” will provision things in the cloud. If you have used Azure I am sure you will be aware than name and location are common things you set on resources. Here name dictates the name of the resource that you will see in the Azure Portal.
  • resource_group_name = azurerm_resource_group.example.name : Ok, now things are getting interesting, this is where the true power of Terraform starts to show. Here we are passing the “output” of one resource into another. Here we are saying that the resource_group_name attribute for our azurerm_storage_account has a dependency on the output of the azure_resource_group named example . Very clever right?

Ok so this hopefully gives you a nice overview of how you can provision resources and how we can go about chaining the output of one resource to another. This “chaining” of resources is actually a very important concept in Terraform as it allows the engine to build its graph of resources. By clearly defining dependencies between resources, it will ensure the engine knows to provision them in the correct order (Example: It won’t try and do the storage account before the resource group it has to go inside), and will help Terraform translate resources into its State.

WAT State??? Where did that come from??

Ok, so if you take nothing else away from this blog post, please remember that at its heart Terraform is nothing more than a state driven workflow engine. It is this state engine that allows Terraform to function and also what makes it so popular with developers. It is therefore imperative that we understand just what state is, and how it works.

Ok, lets imagine you are in a restaurant and when the waiter comes over to your table, you order the following dishes:

  • Soup
  • Steak
  • Side of chips
  • Ice cream

In Terraform terminology this is your “desired state”, in order for you to be satisfied, you want all 4 of those items you’ve ordered. So the kitchen gets the order and starts cooking, but do you care how they do it? If like me you assume they will interpret your order correctly, follow their recipes and eventually the food will arrive at your table to your specification and become available for you to consume.

Well Terraform is exactly the same, and in Terraform terminology, when our “desired state” gets manifested into reality, it becomes known as “actual state”.

Pfffft this State thing seems easy enough… right?

Ok so I hate to sound like a stuck record, but it’s really important that you understand how just how important the concept of state is to Terraform. To be perfectly honest; getting your head around this concept is the key to cracking Terraform as a technology.

When we tell Terraform to do a deployment, it will do a sequence of steps:

  1. It will parse our HCL configuration/code files.
  2. Using the information in our HCL, Terraform will build up a graph of all the resources we want to provision (desired state) and figure out any dependencies between them to try and decide a logical order they need to be created in.
  3. Terraform will next inspect its State to better understand what it has and hasn’t deployed (if it is our first deployment, the State will be empty). This is known as perceived state. It is perceived state because there is a disconnect between what Terraform “thinks” exists and what “actually” exists.
  4. Terraform next performs a logical delta between our desired state, and what it knows to be our perceived state. It then decides which CRUD actions it needs to perform, and the order to perform them in, in order to bring our perceived state in-line with our desired state.
  5. Terraform next performs all necessary operations to achieve the desired state. The result of this operation will be that resources will likely start to appear in our Azure subscription and this then becomes known as actual state.
  6. Terraform updates the state to reflect what it has done.

Lets examine this flow graphically (diagram annotated with numbers corresponding to the steps above):

So as you can see, State persistence enables Terraform to make decisions between executions of configuration.

But why do we differentiate between “perceived state” and “actual state” I hear you cry?

Let’s imagine you work in an office, and you have a desk and chair you sit at every day. And let’s also imagine that this is a shared office, other people come and go day in and day out.

Now if I asked you “is your chair at your desk?”… what would your answer be?

Those among you who said “yes of course, if it’s my chair I’d know it is there because that’s where I left it” have unfortunately just fallen into the common pitfall most do when working with Terraform; there is no way to know if the perceived state matches actual state without first checking it.

In the example of the desk and chair, theres no way to know (without being there and looking) that someone else in the office hasn’t “removed” your chair and not returned it. Likewise in Terraform, there is a chance that someone has “removed” your resource, so Terraform must first check if our resource first exists in State (AKA has been created previously) and then it must check with Azure to ensure that the actual state matches what it expects to find.

Please note: The term commonly given to this chain of operations to bring desired and actual state into alignment is: Reconciliation; Terraform reconciles desired state into actual state.

It’s a State of Mind

Ok so we have provisioned all our resources, job done right? Time passes, and we need to redeploy for some reason, but let’s imagine that during the time that passed, someone deleted our resource group 😱 What happens when we re-run our Terraform?

Well as we have already discussed, hopefully Terraform is clever enough to work around this:

  1. Terraform parses our HCL configuration.
  2. It sees our desired state is to have a Resource Group
  3. It checks in our State to see if there is an entry for a Resource Group with the state identifier example (as mentioned before, notice that the resource is identified as “example”, not the name attribute “example-resources”, just like with variables in other languages)
  4. Terraform sees that the State contains an entry (percieved state), and so next goes to Azure to query the actual state of the Resource Group.
  5. Azure reports back a 404 that our Provider will interpret as the Resource Group does not exist 😱
  6. Terraform now performs the delta between “Desired state” and “Actual state” and realises that the necessary action to perform is to Create it.
  7. Terraform performs the necessary actions to create the Resource Group
  8. Terraform updates its State as necessary.

Notice this time that the key difference is that the existence of something within the State changed the way Terraform functioned, because it “perceived” the resource to exist, because it knew that there should be a resource in the cloud, it went to the Cloud Provider (Azure) and interrogated the “actual state” of that resource. It then made the decision what to do based on the result.

This same chain of events will always take place if Terraform discovers it already has the resource we want inside its state. But this is not to say that the result will always be to Create. If for example someone had renamed the Resource Group, then the result would have been for Terraform to perform an Update the resource to bring actual and desired states in alignment.

State can be a double edged sword

So if State is the brains of the Terraform operation, what happens when it becomes corrupt? Let me tell you friends, if I had £5 for every time I’ve seen people delete their State and then trying to do something with Terraform… I’d be a very wealthy man by now.

Ok so lets explore this, assume the exact same setup as before, we’ve deployed our Resource Group and then for some reason we delete all our State (easily done trust me). What happens this time?

“Well obviously nothing happens” I hear you all cry… but again you’d have fallen into my trap 😉

In fact what does happen this time is:

  1. Terraform parses our HCL configuration/code files.
  2. Terraform will builds up a graph of all the resources we want.
  3. Terraform checks the State and finds it empty.
  4. Terraform does its logical delta between our desired state and what is in its State, and decides it needs to create a resource (remember, Terraform hasn’t queried Azure because the state file didn’t exist/was empty)
  5. Terraform attempts to create a resource group named “example-resources”… BANG!!! 💥
  6. We now have a situation were Terraform errors, because Azure already has a resource group with that name and we cannot create another one with the same name.

Key thing to remember here peeps:

Terraform does not know anything about provisioned resources in your Cloud Provider, unless it’s in the State

It’s all a load of CRUD

Ok I think I’ve gone over State, and its importance, enough now. Time to look at how Terraform decides which actions to perform. As I’ve mentioned previously Terraform is just a State engine, it makes its decision based on its current understanding of the world (State) and the real world. When all is said and done, there are only 3 actual actions Terraform can perform. These are usually known as CRUD, an acronym that is short for CReate, Update or Delete.

So what situations cause which operation to be performed?

Create

  • A desired resource doesn’t exist in our Terraform State.
  • A desired resource exists in Terraform State, but ends up not actually existing in our Cloud.

Update

  • A desired resource exists in Terraform State, but is configured differently in our Cloud.

Delete

  • A resource gets removed from our desired state, but still exists in our Terraform State.

Its the Circle of Terraform Life

Ok so now we have got our heads around state and CRUD (see I told you they were important) we can take a closer look at how we execute our Terraform.

There are 3 distinct steps that need to happen when we run our Terraform on our command line. Let’s walk through them here:

Initialisation

$ terraform init

The first command we need to run is init . This command is used to initialise a working directory containing Terraform configuration files and instructs Terraform to interrogate the HCL files, determine the Providers needed, download them, and initialise a State if it doesn’t already exist. It is perfectly safe to run init multiple times, and in fact you will likely need to do so if you add new providers or change any settings within your Terraform block.

For more detailed info on the init command you can check all the docs here.

Planning

$ terraform plan

The next command we run is plan . This command instructs Terraform to parse our HCL files, build its graph of our resources, check its state and attempt to come up with an execution plan to perform. It is perfectly valid for our init to succeed but our plan to fail. This is because init doesn’t really concern itself with trying to determine if any of our resources are valid or if they exist. Only when Terraform starts to interpret our “desired state” will we start to find syntax errors in our files.

The output of the plan command will be a complex list of operations that Terraform has decided to perform. It is highly recommended that you review these changes to ensure they match with those you expected to see as destructive operations can be costly if they are done in error. Think of this as a “dry run” of all your Terraform.

It is also worth noting that the output of the plan can be saved into a file and used for the next step apply as this ensures that there can be no confusion/discrepancies between what was planned and that which was applied.

For more detailed info on the plan command the documentation can be found here for your convenience.

Applying

$ terraform apply

Simply put this operation tells Terraform to execute all its planned operations. This will usually cause Terraform to redo its plan (unless a plan file is provided to the command) and then present you with an “Are you sure?” prompt. This is your point of no return. If you say yes then Terraform is going to start really provisioning things for you.

This is probably the simplest of all the commands, but it does have some interesting abilities, such as “auto approval” and the ability to increase or decrease the parallelisation of the execution.

As with the other commands, the full docs are available here

A Final Word Of Warning!

Remember dear reader that Terraform uses the identifiers we specify in our HCL to identify resources in State.

So if I had and executed the HCL below:

resource "azurerm_resource_group" "example" {
name = "example-resources"
location = "West Europe"
}

… then it means that upon successful completion, there should be an entry in my State file of an azurerm_resource_group resource with the identifier set as example

A common trap people fall into with Terraform is that they see resource identifiers that they don’t like, or sometimes contain typos and think “oh I’ll clean this up”… but please be very very careful!

If I were to change the above code to be:

resource "azurerm_resource_group" "my_resource_group" {
name = "example-resources"
location = "West Europe"
}

Note the change from example to my_resource_group well now i’m in a very dangerous situation. Next time my Terraform runs it will see this resource and look in its state and NOT find any entry (because the old entry is named example ) meaning that it will decide to create a new Resource Group, and of course that means deleting the old one!

This is even more dangerous because Resource Group is a top level object in Azure, meaning it would also cause everything belonging to that Resource Group to also be destroyed and recreated!!!! This could be absolutely catastrophic for a production environment.

So my tip here is this friends:

If you are renaming resource identifiers, make sure the change is really really really really necessary!

Note: There are ways around this, and ways to import state etc. but that goes beyond the simple nature of this beginner guide.

In Summary (TL;DR)

  • Terraform is a state driven engine that allows us to provision cloud infrastructure easily and consistently.
  • Terraform utilises code known as HCL (Hashicorp Configuration Language)
  • HCL uses the keyword resource to define “resources” we wish to have provisioned in our Cloud.
  • HCL allows us to use the configuration/output from one resource as the input for configuration/attributes of another resource.
  • Terraform interfaces with different Cloud technologies using Providers.
  • When terraform runs it will parse the HCL files and build a graph of resources we want — known as Desired State
  • By “chaining” resources together, it enables Terraform to make explicit dependencies between objects in its graph.
  • Terraform stores knowledge about all the resources it has provisioned previously in a file known as its State file.
  • The contents of the State file are known as Perceived State — the state Terraform left the environment the last time it was run using our HCL files.
  • Terraform uses Backends to determine how State should be persisted.
  • There are many different Backends that can be used depending on the Cloud Service Provider we use and how we want Terraform to persist State.
  • When Terraform wants to provision a resource, and that resource exists in its Perceived State, it will interrogate the Cloud Provider to determine the Actual State.
  • If there are discrepancies between desired, perceived and actual states, Terraform will determine the corrective action required to bring actual inline with desired.
  • Terraform has 3 distinct lifecycle stages: init, plan and apply
  • If you rename a resource identifier, Terraform will act as if its a brand new resource its never seen before, and purge everything to do with resources relating to the old identifier.

Final Thoughts

As I’ve mentioned countless times in this article, if you take nothing else away from this article. Remember:

Everything Terraform does revolves around State. If you don’t understand how Terraform manages and maintains State; you don’t understand how Terraform works

So all that’s left is for me to say thanks for your time, and I hope that something in this article helps future generations of Terraform developers.

Further Reading:

--

--

Writer for

Software engineer & lover of all things code. Too much to learn and so little time. Currently working at Trainline London.