Over the past couple of months, I’ve been falling in love with Terraform. For those of you who haven’t used it, it allows you to keep a description of your infrastructure in a source repository and build out environments in a rapid, simple manner using various cloud computing platforms.
Up until recently, I was running Terraform exclusively against AWS and the results were astounding – 34 objects (Security Groups, Instances, Load Balancers, Virtual Private Clouds etc) build in under two minutes. It’s now got to the point where I can stand up an multi-node ELK stack in AWS in under 30 minutes.
When I was recently set the target of replicating that stack in Microsoft Azure, I thought “awesome, this will be easy!”. The fact that I am even writing this blog post should tell you that it wasn’t quite as straight forward as I thought it would be however here is my experience for you all to laugh at – after all, if you learn from other people’s mistakes, it’s cheaper…
The differences between AWS and Azure
The first and major difference is that what AWS calls a “Virtual Private Cloud” is known as a “Virtual Network” inside Azure as far as I can tell. You create a Virtual Network, divide that into Subnets and then assign Hosted Services to those Subnets. Instances are then assigned to Hosted Services – one Instance per Service. This strikes me as a little strange (why not just assign the instances to a subnet?) however this is probably a misunderstanding on my part.
The second difference that I found between the two environments is that you appear to only be able to upload SSH Keys via the web interface at the point that you create an instance. Not such an issue with Windows instances, however with Linux instances this is a real pain as it means SSH Keys are effectively tied to each machine unlike AWS where SSH Keys can be used across multiple instances. There is also an issue in that Terraform at present does not appear to be able to upload keys to Azure in the same way that it does for AWS, leaving us with password-based access for all our nodes stored in plain text in terraform.tfstate.
The third and final difference that I’ll concentrate on in this article is how you access your instances in Azure vs AWS. My standard approach is to build a Bastion Host and this approach still works, you just need to be careful how you configure it.
Solving the problems
As far as the terminology differences are concerned, Mani Chandrasekaran has written a fantastic post on the subject which helped me work out where I was going wrong. The Terraform for setting up the Azure equivalent of a VPC is as follows:
The above provisions a single instance into a security group inside a subnet nested in a Virtual Network.
It uses a variable (ssh_user_password) to set the password of the instance so you can log in (solving the issue of SSH Key Distribution as best we can at the current time), giving you the option to either specify that at run time or export it to the environment variable “TF_VAR_ssh_user_password” before running terraform apply.
If you don’t already know, Terraform has the ability to create a directed dependency graph of your infrastructure so you can visualise the system before you apply it. The output of the above (with TF_VAR_azure_settings_file and TF_VAR_ssh_user_password set to an appropriate value) is as follows:
Now that we know that all the components do what we want and we can see how they all link together, we can look at adding an additional instance that we can SSH to via the Bastion Host:
Again, we’re using the variables to set the instance password and we’re passing around “internal” variables to associate things correctly, however our graph now looks like this:
And we can only SSH to the web instance via the jump box. Port 80 for the web instance is still available via the Hosted Service however running nmap against the system shows that the only port exposed is port 80.
Things currently missing from Terraform for Microsoft Azure
There are a few things missing from Terraform, however this is to be expected given that the Azure provider is very new. The ones that stand out the most are as follows:
- Ability to upload and assign SSH keys as part of an instance definition – the API allows this and the SDK appears to however Terraform does not at this time
- Management of traffic-master load balancers etc. – This is in progress and relies on the upstream SDK merging changes
Other than that, at the moment I’ve not found Terraform lacking when it comes to Azure and being able to build out my infrastructure in Azure using the same tools I use for AWS is awesome 🙂
If you’ve solved any of the problems that I’ve found above, please let me know in the comments below!