Terraform Cloud is convenient, especially when you have many teams, microservices and projects all needing their own infrastructure as code (IAC) along with SSO regulated access control and the need to run IAC from certain locations inside protected private networks. While Terraform itself can be annoying and difficult at times, it is still the leading tool for many IAC use-cases. Once you get beyond a small amount of IAC, and it starts to become used in a democratized way across your organization, the need to structure and monitor its use becomes important, especially for answering those pesky ‘who has access to X’ questions from compliance and security.
I used to work on a team where we were the only people using Terraform, and we all just ran it on a single server, switching to the same user to run things. Every once in a while we would ping in slack ‘hey who is running an apply – can I kill your lock?’ – This gets old fast, but in a small setup with the state file in a safe place like S3 with versioning, it runs just fine. It is also ‘free’. I put the word free in quotes here as a reminder that almost every open source project you can use has a cost, in time and headaches, that hopefully you aren’t dealing with if you pay a vendor for the same service. In the case of Terraform Cloud however, you still have the same issues you run into when using Terraform for ‘free’, but you get to share those issues easily with everyone in your org by pasting your run link in slack and praying that somehow, someone else has run into the same issues you have and can decipher the cryptic error messages on your screen.
Enter the SAAS business model. Now I’m generally a proponent of managed services that save time, as I personally don’t want to spend all my time troubleshooting the kubernetes control plane, or figuring out what caused a mongo cluster to get hours behind on replication in prod. I like to get things done, help people solve their problems and generally help make things better for the business and better for the people I interact with every day. Sure it can be fun to solve a pesky networking issue that has been plaguing your cluster for months, but there’s something to be said for having a hand in getting something new rolled out, whether it is a feature for the customers or a feature/automation that helps the internal org move faster. One of the biggest parts of being engaged is believing that the work you do ‘matters’. Managed services can help out on this front, to a degree.
Now Terraform Cloud is generally a solid product. We use it constantly, and according to something I saw recently from our rep we are managing > 35K ‘resources’ with it. This can be a bit misleading, as what that number means in reality is ‘resource blocks’ of Terraform code referenced in corresponding state files. Some physical/real infrastructure resources will use up more than one resource block of code. In the current version of the AWS provider for example, a single S3 bucket can end up being more than 10 different blocks of ‘resources’ due to different aspects of the bucket configuration being defined by each resource type.
The downside of managed services, of course, is possible lock-in, and unexpected changes to pricing or to the product itself. From my understanding Hashicorp is generally full of brilliant folks, and I’m sure that they noticed that a number of companies were using Terraform Cloud (TFC) to manage a ton of resources and had only a few users. In a user-based pricing model this makes sense, but for most companies who ‘need’ to use a product like TFC they need it to manage the IAC runs and access of their developer user base. We currently have ~500 workspaces and hundreds of users in many teams managed automatically through our SSO provider. So TFC for us was already on the ‘expensive’ list. You know the list I’m talking about right? The one that gets passed around every year or every quarter where you hear grumbling from upper management about costs, and you aren’t sure if a vendor contract will be renewed again this time? Generally after a few meetings defending the use of the product and the costs, things settle down again and you wait for the same thing to happen again a year later.
So without any warning we discover that our contract will suddenly go from being the current user-based pricing to their new ‘Resources Under Management’ (RUM) pricing model. I’m sure for some people there isn’t much of a difference, but for for us the costs balloon to over 3x the current costs. This moves our usage of TFC from the ‘expensive but necessary’ list to ‘next year’s migration project’ list. Immediately. Looking around online for other people’s reactions turns up a few amusing things:
- Reddit thread: One user comment stands out the most to me here: “The pricing on Terraform literally just jumped to almost same as we are paying for the AWS resources it manages.” https://www.reddit.com/r/Terraform/comments/13jgzc5/terraform_new_pricing/
- Medium: “The fundamental problem seems to be that the pricing model is okay for small and medium businesses who are just starting off, but it just does not scale.” – https://medium.com/@DiggerHQ/navigate-terraform-clouds-updated-pricing-strategy-by-moving-to-these-alternatives-84cde063ec3
Now I’m all for the freedom of companies to price their products in a way that enables them to prosper, but changing pricing models on an existing customer base that has to spend significant money and time to move to other, cheaper products tends to leave a bad taste in my mouth. Remember the Unity game engine fiasco recently? Look it up. Things didn’t go well when some folks realized the new pricing would put them out of business. Not a fair comparison at all to our TFC pricing change here, but it’s an example of something that seems to happen over and over nowadays. Someone starts a business, gets a ton of subscribers and a large chunk of market share and then changes their pricing model once they feel secure enough in doing so. This is the sort of thing I worry about with vendors, especially ones like AWS. I also cannot be the only person who keeps swearing off using Uber and Lyft after discovering their pricing model now makes taxis seem cheap at times.
Another tactic being used in our TFC pricing model change here is the ‘get them in the door cheap on a pricing model that will never decrease‘ idea.
Lets unpack this a bit. Why does it matter that the pricing went from user-based to resources based? I’ll tell you why. You will almost never get rid of more IAC code in a month than you add unless the sky is falling. Lets say you already have a bunch of infra and you want it all managed with IAC. Each month you will be adding more and more resources to your list of things managed using TFC. Lets say you are a small company just starting out. The first 500 resources are free. You start to add some IAC to your TFC account and pretty soon you are up to thousands of ‘resources’. This in itself isn’t much, but if you continue to grow it gets to be pricey. The hope is that larger companies with many users will get pulled in by the super low starting costs, and after a bunch of users all doing their job start to add a bunch of code individually the pricing will just keep going up and up and up over time.
This won’t happen in a vacuum, however. Companies will be forced to become picky about what they use IAC for, which in some ways goes counter to the reason you started to use IAC in the first place. Remember all those complicated setups in AWS that used to be fully configured via code? Now you will be spending hours on an incident call trying to figure out why one of them stopped working. Or worse, you will start using CloudFormation. (Hey that’s a disturbing idea – I can reduce the number of resources I use in TFC by just deploying CloudFormation using Terraform)
All jokes aside, this also seems to be eerily similar to the cloud services business model doesn’t it? How many companies started out using cloud services because of the convenience, flexibility and scalability of services, yet discovered years later that they are stuck paying enormous monthly sums to maintain their infrastructure in the cloud? Every little piece of it costs money, and it adds up, like a financial torture device, over time. How many times have we heard the argument about how much cheaper the cloud is due to ‘reasons’ only to discover later that we are seemingly paying the full up-front cost of a real server in 8 months when using one in the cloud? I haven’t seen any stories yet about companies moving from the cloud back to on-prem where they discover, to their surprise, that they have increased their overhead costs. It seems to be a common theme. Start small in the cloud, and if you get big later once your infrastructure footprint becomes stable, realize huge savings moving back to your own servers. Terraform Cloud now seems to work the same way.
The difference here is size, complexity and the fact that Terraform itself is/was open source. With the recent fork to open tofu, and the need for something like Terraform Cloud, I wouldn’t at all be surprised if someone creates a self-hosted clone of TFC and some companies start to use it instead. Only time will tell. In the meantime, we will be evaluating replacement solutions, hoping that in the end whatever we end up using doesn’t also suddenly change underneath us, requiring yet another large and complex migration project. Maybe some of the companies offering replacement services can do the migrations for free to get our business?