Testing IaC — Why should we do it?

6 min readSep 16, 2020

Hi guys, how are you? I hope you’re all doing fine.

It has been a while since I wrote in here, so please, just bare with me. I’ve been studying a lot recently and doing several labs and PoC’s. And today I’d like to share one with you.

I wanted to test my Terraform code, just to check if everything would go up according to what was planned.

So I was looking in the Internet and found out about Terratest. I really encourage you to read and study this, it is a great tool and it is so easy to get the hang of it.

In this first part, I want to discuss my Terraform code, the way I did it and why I did it. I’d like to make some considerations and talk about IaC in general.

In the second part, I’ll cover the testing and how we could create a pipeline in order to achive IaC as a whole development process.

The GitHub repository is right here, you can take a look if you want, share it, be inspired by it. Just let you imagination flow.

The Architecture

From the beggining I wanted to try something new, I found about this Python Library which is amazing. It let’s you design you system using code and then creates a png out of it. I tried for about 15 minutes and was able to come up with the image in my GitHub. The code is pretty straight forward and simple:

I really liked this one because I’m not a good designer, but I can write some code. It’s not the best of my work, but it’s understandable.

The Network

Let’s start from the ground up! Our network is divided in public and private. In the public part, we’ll have our loadbalancer, which will be responsible to balance the application load.

In the private part, we have our application layer, and our database.

I decided to use a module for our VPC just to make things easier to understand. And the cool part of it, we don’t need to decide a specific region of AWS for our code to run, it works in any region, this’ll be explained in the testing section.

The Load Balancer

For the load balancer, we have this amazing module, that abstracts most of the original Terraform resources in a simple and clean way. The more complicated part was the security group, because I found the module documentation not so clear in the ingress/egress rules computed. But I’ve managed to work.

The Application Code

We have here a Flask API, written in Python. Our app connects to a database and returns, in JSON:

the database version
the region which was deployed
an unique id

To set up our environment, I decided not to use a Configuration Management Tool, like Ansible, but to do it using a userdata script. I had some trouble in the past using those scripts, but I’ve found the right way to do it, which is using as a .tpl file, like below:

data "template_file" "user_data" {  
  template = file("./scripts/userdata.tpl")  
  vars = {    
    rds_username = module.db.this_db_instance_username,
    rds_endpoint = element(split(":", module.db.this_db_instance_endpoint), 0)    
    rds_database = module.db.this_db_instance_name,    
    rds_password = random_password.rds_password.result,    
    unique_id    = var.unique_id  
  }
}

In the .tpl file we just write simple bash script as usual.

Another thing I’d like to point out, is that I left a pub key for SSH, but in fact we don’t need that, we are going to connect to our instances using SSM, a faster and more secure way to do it.

The Application Scalability

To scale our app up, if needed, I used and auto scaling group. I’ve found the module pretty well documented except for one thing: tagging. Tagging was really difficult in this module, so I had to use another one just for it.

Turns out the return format of the ASG Module it doesn’t match a mapping that I was doing. Many people faced the same issue so the community came up with a module:

module "asg_tags" {  
  source  = "rhythmictech/asg-tag-transform/aws"  
  version = "1.0.0"  
  tag_map = merge(local.common_tags,    
    {      
      ResourceGroup = var.tag_name    
    }  
  )
}

The Database

For our database, I chose a MySQL instance, because it’s simple and fast. Here is something I’ve wanted to try for a while. Our RDS master password is random. We generate the password during the terraform apply and we save a connection string in the SSM Parameter Store. With this, we have now a secure way to store our secrets and we don’t even know they.

Here is the way to do it:

resource "random_password" "rds_password" {  
  length           = 16  
  special          = false
} resource "aws_ssm_parameter" "rds_connection_string" {  
  name   = "/rds/rds_connection_string"  
  type   = "SecureString"  
  value  = jsonencode({        
    "URL"      = module.db.this_db_instance_address,
    "PORT"     = module.db.this_db_instance_port,        
    "DATABASE" = module.db.this_db_instance_name,        
    "USER"     = module.db.this_db_instance_username,        
    "PASS"     = random_password.rds_password.result    
  })  
  key_id = data.aws_kms_key.pass_kms_key.id
} data "aws_kms_key" "pass_kms_key" {  
  key_id = "alias/aws/ssm"
}

Tagging Resources

Personally, tagging is one of the most important subjects when we are talking about public cloud.

With the right tags, we don’t face issues when dealing with cost management, or resource management.

In our module I decided to have mandatory tags and optional ones. In any resouce I’d just do a merge of them.

The mandatory tags are in the locals:

locals {    
  common_tags = map(        
    "Name",  var.tag_name,        
    "Owner", var.tag_owner,        
    "Env", var.tag_env    
  )
}

And then we just merge it like this:

tags = merge(    
  local.common_tags,    
  map(        
    "Resource", "alb"    
  )  
)

It’s simple and scalable.

SSM Management

For the last part(and also my favourite) we are talking about SSM. I’m using SSM extensivly in my daily work because it’s a simple to learn and easy to use.

AWS has been doing a great job adding features to this service and now I’m here to use a little bit.

The first thing we need to use SSM is having the agent installed. If you are using any of AWS official images, they already install it for you, so you don’t have to worry.

The second thing, is to make sure you have the right policy in your instance profile. The policy is SSMManagedInstanceCore. I’ve attached it in the iam.tf file.

The last thing, is to make sure you instance has and outbound security rule that allows your agent to communicate with AWS. I’ve used the easier way, which is to allow HTTPS/443 for anyone. If are considering a more secure way to do it, you might want to create a VPC endpoint for the SSM URL and point it in you security group.

There is also another thing that I’ve done, it’s not mandatory, but for sure is nice to have. I’ve created a Resouce Group, which allows me to manage a bunch of instances using a common tag(remember how I said tagging was important?).

With this, we can easily connect and manage our instances using SSM Session Manager or Run Command.

OK, this is cool. This code is fun. But…

What now?

Well, I was wondering the same thing. Being able to write a good code is nice, but it is useless if this code fails in production. This is why we need testing. Our code should be tested extensivly and must cover most scenarios. What if we want to deploy this in another region of AWS, for example. Will it work?

In cases like this, we need to test if our code can support those changes and be able to adapt.

This is why I used Terratest, in order to be certain that my code won’t fail in some parts, I tested. And in the next part of this article you’ll find how I did it.

Guys, thanks for reading this artcle, I’ve been waiting so long to write something like this. The last part is here, in which I talk about testing our Terraform code.

If you like it, leave a Clap and share with your friends.

See ya!