Paddy at HashiConf 2019 by HashiCorp

Today's My Last Day at HashiCorp

Getting right to the point, today is my last day at HashiCorp. I’m leaving my position as Engineering Lead of the Terraform Plugin SDK team for new challenges and adventures. After five years of working on Terraform, it’s time for me to turn my attention to new problems. I don’t want to talk too much about what’s next–it feels gauche to celebrate the future while we’re celebrating the past–but I do want to spend some time looking back. I’ve done that before, but that was four years ago, and a lot has happened since then, so I wanted to spend some time fondly reminscing and celebrating the achievements, both my own and those I played even a small part in.

I already posted a recap of my first year: my role in the creation of our Pride sticker; my talk at HashiDays London; being in a marketing video with my friend Dana; building the bot that handled the migration of Terraform providers from all being in the hashicorp/terraform repository and into their own repositories.

The last four years haven’t been much different than that first year. I was privileged to be part of a great many initiatives, some of which I led, but none of which I did on my own or even did most of the work on. I said in my first post:

One of the best and worst things about working at HashiCorp is that you’re surrounded by smart, kind people who care about what they do. It’s hard, because it’s intimidating. It feels hard to measure up. But it also means you’re surrounded with supportive people who will help you to feel like you measure up.

and I stand by this assessment. The achievement I’m the most proud of over the last four years is the people I’ve had the privilege and honor of working with.

terraform-provider-google

With the brilliant engineers of HashiCorp and the Google Cloud Graphite team, I continued to lead HashiCorp’s efforts on the Google Cloud Provider team. While the Cloud Graphite team did most of the work, I still got to serve as a partner in rolling out Magic Modules, making the Google provider the first machine generated official provider for Terraform. I got to give a talk with Dana about it at HashiConf 2018. This allowed us to introduce the beta provider, which consumed beta APIs, which was another first–normally, official providers (with some exceptions) would only introduce stable APIs, so the providers themselves could be stable.

We made it through Terraform 0.12 together, and managed the difficult engineering task of having multiple long-running branches and working to hit moving targets. We tried to coordinate version 2 of the Google provider with the release of Terraform 0.12, before decoupling the two. We moved to weekly releases, and eventually automated our changelog generation, creating an approach that has subsequently been adopted by other HashiCorp projects with some success.

To help power a feature Google was looking for, I was given patient tutelage on how Terraform core works so I could build provider metadata into Terraform, my first significant contribution to the core functionality of Terraform. This learning laid the foundation of my 2019 HashiConf talk, which was about the social and technical underpinnings that made Terraform 0.12 such a difficult upgrade.

We released version 3 of the Terraform provider together, which was the first release of a provider to have a beta release period before its final release.

Building a Better Terraform Provider Developer Experience

In the wake of the version 3 release, Paul Tyng approached me about joining the Terraform Plugin SDK team to find a path to a better Terraform provider developer experience. I accepted, and moved over to that team in early 2020, just before the world changed. In my absence, the team at HashiCorp responsible for the Google provider and the team at Google working on it have continued to excel and push the boundaries of provider development, and I’m incredibly proud to have worked on that codebase with them. Doing so made me a better engineer.

I started on the Plugin SDK team just in time to meet up with my teammates at HEX. I spent that time bending their ears about my cockamamie scheme to rebuild the Terraform provider developer experience from the ground up. The problem was, essentially, that SDKv1 (the latest version shipped at the time) was built on the abstractions of Terraform 0.11 and below. It didn’t incorporate any of the learnings of Terraform 0.12 or the versions that had come after, and the abstractions interacted in weird ways that were impossible to predict. It was no longer the best solution to the problem we had. But we couldn’t just throw it out and start over; there were over 1,000 providers built on it, and more were being built on it every day. Some of those providers had thousands of resources and data sources, hundreds of thousands of lines of code. We needed to find a way to change the abstractions, making it more approachable for new provider developers, without leaving our entire ecosystem behind us.

We were inspired by webservers. I think it was when I was reading about GitHub’s upgrade of their Rails version that it struck me: large projects are built on frameworks every day, and those frameworks manage to evolve and upgrade, because they don’t force developers to do all the upgrade work in one massive deploy. Developers can upgrade between the two piecemeal, using something like nginx to split traffic between them. And when we asked “why couldn’t Terraform do that?”, and we decided it can. And that became the cornerstone of our strategy for redefining the provider developer experience.

SDKv2

The first step was to ship our ongoing project, version 2 of the Plugin SDK. Version 1 of the Plugin SDK was all about splitting the Plugin SDK out of the hashicorp/terraform repo, putting it into its own repository. But the test framework for providers still had a dependency on Terraform to run tests, and so we had to keep manually porting over changes from the hashicorp/terraform repository to an internal, vendored copy of the code. It was a pain, and we wanted to stop doing that. We also wanted to let providers run tests against different versions of Terraform, not just whatever not-actually-a-release version of Terraform was vendored inside the test framework. So version 2 became about dropping that reliance on hashicorp/terraform and rewriting the underpinnings of our test framework to puppet around actual Terraform binaries when running tests. We also took the opportunity to introduce contexts into many function signatures.

This was a technically complicated project; I got to build in some more new functionality into Terraform, got to dig into the inner workings of go-plugin, and got distressingly familiar with Terraform’s test framework.

terraform-plugin-go & terraform-plugin-mux

In the wake of shipping v2 of the Plugin SDK, we worked to ship bug fixes restoring prior behaviors when we found unintended deviations from how the test framework worked in version 1. I also turned my attention to what eventually became terraform-plugin-go. We needed a protocol-level binding to the Terraform protocol, something with no abstractions of its own that just surfaced the protocol itself. This gave us the bedrock that SDKs could be built on and all agree on the Go types that represent protocol types and endpoints. We released it during a special HashiCorp Live alongside terraform-plugin-mux, our solution for combining providers built on separate SDKs into a single logical provider. Embarrassingly, I didn’t do a good enough job including my team in the design and implementation of these projects until the very end, and that was a mistake I was keen not to repeat. I was happy with the code we had shipped; I was unhappy with how we had gotten there.

Designing the Framework

terraform-plugin-go was purposefully low-level, a level of abstraction far below and far harder to work with than SDKv2. As 2021 kicked off, our major project was to envision, design, and ship a provider developer experience at the same level of abstraction and ease-of-use as SDKv2, but built on the new Terraform abstractions that terraform-plugin-go was built around and exposing. I struggled for months to figure out how we were going to come up with a design that the entire team felt ownership of; I didn’t want people to feel like they had the opportunity to offer feedback, I wanted them to feel like it was their idea. I wanted everyone to feel ownership of the idea.

I kept thinking about building a five or ten year codebase in terms of what we were shipping and which artifacts should be prioritised higher than others. I made the controversial assertion that the code was the least important thing we were shipping. More important than the code, I argued, was shipping a mental model of provider development that developers could wrap their heads around, something I felt like we had lost by that point for SDKv2. I wanted us to keep thinking about what concepts we were creating, and how we’d teach them to provider developers. But even more important than that, I argued, was shipping a team that could steward this project. If you want a ship to sail for ten years, you don’t worry about each plank, you build a crew that can care for it. And I felt that was our biggest risk, that we’d build something only one person truly understood the context and tradeoffs of, and then the longevity of the project would hinge on their judgment and presence. That’s an unkind amount of pressure to put on someone.

So the first thing we did when building the framework was to figure out how we were going to design together. We were used to coming up with ideas, fleshing them out, pitching them to each other, and incorporating feedback. But doing that means an idea has a clear owner. After discussing with some of my coworkers, most notably Pam Selle, we came up with a process: we’d create “design documents”, documents laying out the problem and recording all the context we could think of. Every bit of nuance that informed a decision should land there, as well as every possible way to solve the problem that we could think of, and no discussion of a decision can happen until we all agree that we have no more context or possible solutions to offer. That helped us design together, weighing the concerns and possible solutions together, making our arguments from a shared context we had all already agreed on. It also helped future maintainers; you can still go back and see why we made the decisions we made, and more importantly, what context we considered as we made those decisions. You can see which context no longer applies, you can see which bets and hopes and predictions didn’t pan out, and which did. You can see which fears were founded and unfounded. My hope is that future maintainers will be able to use this context to reevaluate our decisions over time, and even when the circumstances around us change, they’ll be able to maintain the software we wrote, because they’ll know what we were trying to achieve when we wrote it.

Shipping the Framework

We worked really hard throughout 2021 to ship terraform-plugin-framework, and I’m really proud of how we shipped it. We quietly shipped a version 0.1.0 that served almost as a proof of concept; we set the criteria for a release as “a provider can technically be built with this”. If you could technically build a provider and have a resource do something, anything, it was ready to ship. Didn’t need to be useful, didn’t need to be ready, we just needed to be able to evaluate our decisions by using them to build something. And we shipped it, and it became real, and it hit me: we were doing this, and it was good. My years old hypothesis that a much better provider developer experience was possible and we could ship it was being validated. I still remember where I was sitting, what I was doing when that realization hit.

We didn’t make a big deal about v0.1.0. Didn’t really make a huge splash anywhere about it. That was reserved for v0.2.0, which we shipped at the inaugural HashiTalks: Build in a session introducing the framework. At this point, we wanted to have a compelling project that could build at least some useful providers. We shipped v0.2.0 along with the talk, and shipped a new section of the docs in partnership with our Education team at the same time. And my favorite detail, the Education team shipped a new Learn guide that offered a step-by-step introduction to building a new provider in the framework. Biting off more than I could chew, I also did a talk for HashiTalks: Build about Terraform’s type system and another on designing Terraform-friendly APIs, two talks I’m proud of.

Since then, we’ve been iterating on the framework. We added more features, smoothed out rough spots, fixed bugs. And this has grown from a demonstrative prototype of what the Terraform provider developer experience could be into a codebase that could compete with SDKv2 for building production Terraform providers. And we have real-world providers being built on it, which is extremely exciting and gratifying. I’m incredibly proud of what we’ve shipped, how we’ve shipped it, and the impact it has had.

We’re not at the end of the roadmap we defined back in 2020 for a new provider developer experience, but we’ve shipped some substantial pieces of it. And I’m confident that the team will continue to ship improvements and build amazing things, so I’m incredibly optimistic about the future of Terraform provider development. I wish I could’ve been there to ship more of it, but I’m excited about what’s next for me, and I’m excited to see what the team does with me out of the way.

Moving On

I’ve been working on Terraform for HashiCorp for about half my career now. I’ve learned so much from the problems, opportunities, and people I encountered here. I’m a better engineer and, I think, a better coworker for my time there. It’s sad and I have a heavy heart leaving it behind. Thanks for 5 years of growth and opportunity, HashiCorp. And thanks for all the support you’ve given me over the years, Terraform ecosystem.

Happy Terraforming.