#227 - Infrastructure as Code: Delivering Dynamic Systems for the Cloud Age - Kief Morris

04-Aug-2025 56 mins Kief Morris

included in DevOps Infrastructure Cloud Automation

“If you talk to business and product leaders, they are often frustrated to find that infrastructure and environments are a bottleneck and a friction point. We need to align what we build with infrastructure to business needs and value.”

How has Infrastructure as Code changed in the last five years? Explore the key shifts and how to align your infrastructure to real business value.

In this episode, Kief Morris, a Distinguished Infrastructure Engineer at Thoughtworks, returns to discuss the third edition of his book “Infrastructure as Code.” He shares fresh insights on designing and delivering dynamic systems for today’s cloud-driven world. Kief explores the evolution of IaC, practical methods for modern teams, the next generation of tools, and lessons learned from the recent years. Learn how to align infrastructure with business needs and manage today’s growing infrastructure complexities.

Key topics discussed:

How “Infrastructure as Code” book has evolved across three editions
Why infrastructure decisions must align with business value
How IaC and the toolchain have evolved over the last few years
Handling the growing complexity of modern infrastructure
The rise of platform engineering and internal developer platforms
Terraform vs. OpenTofu: which one should you use?
Balancing governance, speed, and innovation in the cloud era
The current limitations and role of AI in managing infrastructure

Timestamps:

(02:39) Updates in the Last Five Years
(04:13) Infrastructure as Code Definition
(05:58) The Practice of Infrastructure as Code
(06:32) The Differences Between the Book Editions
(10:21) Aligning Infrastructure to the Business Value
(15:03) Handling the Growing Infrastructure Complexities
(19:10) The Tools and New Inventions in IAC
(24:11) Terraform vs OpenTofu
(27:38) Orchestrating Infrastructure Changes Using IAC
(30:35) Platform Engineering
(33:06) Internal Developer Platform Key Success Factor
(37:15) Key Considerations of Building Teams with Infrastructure Skills
(41:56) Infrastructure Compliance and Governance
(45:53) Using AI for Infrastructure as Code
(50:31) Using AI for Troubleshooting and Root Cause Analysis
(51:50) 3 Tech Lead Wisdom

_____

Kief Morris’s Bio
Kief Morris is the author of the O’Reilly book Infrastructure as Code, and is a Distinguished Infrastructure Engineer at Thoughtworks, based in London. He works with clients and project teams around the world to explore, shape, and share better ways of working with cloud and infrastructure architecture.

Kief started out as a developer and systems administrator in the dot-com boom days, then worked with a series of digital scaleups applying infrastructure automation before DevOps was a thing. He joined Thoughtworks in 2010 as the wider industry was discovering Infrastructure as Code, DevOps, and Cloud, which gave him the opportunity to bring what he had learned in the previous fifteen years to enterprise clients in many industries and many countries.

He wrote the book Infrastructure as Code (now on the third edition) to share these ideas with a wider audience, which has given him a platform to meet and learn from an ever-growing variety of people and organizations.

Follow Kief:

LinkedIn – linkedin.com/in/kiefmorris
Twitter – x.com/kief
BlueSky – bsky.app/profile/kief.com
Personal Website – kief.com
Infra as Code Website – infrastructure-as-code.com
📚 Infrastructure as Code – https://www.oreilly.com/library/view/infrastructure-as-code/9781098150341/

Mentions & Links:

🎧 #5 - Infrastructure as Code - Kief Morris – https://techleadjournal.dev/episodes/5/
Terraform Stacks – https://www.hashicorp.com/en/blog/terraform-stacks-explained
Infrastructure as code – https://en.wikipedia.org/wiki/Infrastructure_as_code
CustomResourceDefinitions (CRD) – https://kubernetes.io/docs/tasks/extend-kubernetes/custom-resources/custom-resource-definitions/
Terraform Automation and Collaboration Software (TACOS) – https://itnext.io/spice-up-your-infrastructure-as-code-with-tacos-1a9c179e0783
ClickOps – https://en.wiktionary.org/wiki/ClickOps
Containers – https://www.docker.com/resources/what-container/
Serverless – https://en.wikipedia.org/wiki/Serverless_computing
Control loop – https://en.wikipedia.org/wiki/Control_loop
Internal developer platform – https://www.redhat.com/en/topics/devops/what-is-an-internal-developer-platform
Resume-driven development – https://ieeexplore.ieee.org/document/9402191/
CI/CD pipeline – https://en.wikipedia.org/wiki/CI/CD
AI coding assistants – https://blogs.oracle.com/ai-and-datascience/post/ai-code-assistants-explained-tailored-developers
Chef – https://www.chef.io/
Puppet – https://www.puppet.com/
Ansible – https://www.redhat.com/en/ansible-collaborative
Terraform – https://developer.hashicorp.com/terraform
CloudFormation – https://aws.amazon.com/cloudformation/
Cloud Development Kit – https://aws.amazon.com/cdk/
Pulumi – https://www.pulumi.com/
OpenTofu – https://opentofu.org/
Atlantis – https://www.runatlantis.io/
Spacelift – https://spacelift.io/
Terrateam – https://terrateam.io/
Pulumi ESC – https://www.pulumi.com/docs/esc/environments/
System Initiative – https://www.systeminit.com/
Prometheus – https://prometheus.io/
CloudWatch – https://aws.amazon.com/cloudwatch/
Remote Desktop Services (RDS) – https://learn.microsoft.com/en-us/windows-server/remote/remote-desktop-services/remote-desktop-services-overview
YAML – https://yaml.org/
Kubernetes – https://kubernetes.io/
Kratix – https://www.kratix.io/
Adam Jacob – https://www.linkedin.com/in/adamjacob/
Neal Ford – https://nealford.com/
Birgitta Böckeler – https://www.thoughtworks.com/profiles/b/birgitta-bockeler
James Lewis – https://www.csis.org/people/james-andrew-lewis
Weaveworks – https://github.com/weaveworks
HashiCorp – https://en.wikipedia.org/wiki/HashiCorp
ConfigHub – https://www.confighub.com/
Humanitec – https://humanitec.com/
Thoughtworks – https://www.thoughtworks.com/

Our Sponsor - Tech Lead Journal Shop

Are you looking for a new cool swag?

Tech Lead Journal now offers you some swags that you can purchase online. These swags are printed on-demand based on your preference, and will be delivered safely to you all over the world where shipping is available.

Check out all the cool swags available by visiting techleadjournal.dev/shop. And don't forget to brag yourself once you receive any of those swags.

Like this episode?

Follow @techleadjournal on LinkedIn, Twitter, Instagram.

Buy me a coffee or become a patron.

Buy me a coffee

Quotes

Infrastructure as Code Definition

When I describe it, I say it’s about using tools, techniques, and practices from software engineering and applying them to managing infrastructure. That’s probably still accurate.
There’s a lot of interesting stuff going on now if you look at automated infrastructure management. There’s a lot of experimentation with different ways that aren’t necessarily directly about code or are used in different ways.
It is about using code as the means for specifying your infrastructure. That’s probably a fair definition.
I’m tending to talk more now and use the phrase “infrastructure automation” and “infrastructure orchestration” to be more all-encompassing of some of the other techniques and things that are emerging.

The Practice of Infrastructure as Code

Most organizations doing infrastructure, especially those using cloud infrastructure, are using infrastructure as code tools. How effectively they’re using it is a different matter, but I think it is pretty much the default these days.

The Differences Between the Book Editions

The first edition was in 2016, and the second was in 2020. In the first edition, servers were a big part of it; the subtitle of the book was Managing Servers in the Cloud. Chef, Puppet, and Ansible were the main things, even though tools like Terraform and CloudFormation had emerged and people were starting to use them.
In the second edition, servers were still in there, but a lot more of the emphasis was on what you use to orchestrate multiple cloud resources—including servers in the mix, but also all the other things around it. So, Terraform, CloudFormation, CDK, and Pulumi were the tools that were a big part of the focus.
When I first undertook the third edition, I thought it probably wouldn’t need to change that much, but I found that there was a lot more to talk about than I’d expected.
One of the things was thinking about how to align what we build with infrastructure to business needs and value and bringing that more into it. Infrastructure, cloud, and all this stuff has become a lot more pervasive. Previously, it was something that maybe one part of the business was doing, like startups. Now, it’s become so much more pervasive that I think it’s become really important.
It used to be everybody building as much as they can and getting everything digital—digital transformations and all that. Now, more recently, there’s been a lot of thinking about, how do we rein it in a bit? How do we consolidate all the different stuff that maybe we’ve chucked up on the cloud? So that’s been one theme.
Another theme was the design of infrastructure and how to design more complex infrastructure states using code. And the third is what you could call orchestration.
This is thinking about if you use Terraform or OpenTofu, you have these TACOS—this idea of things like Atlantis, Spacelift, and Terrateam, and loads of different things out there to help you manage, deploy, and update your infrastructure code.
It used to be that we would write loads of custom scripts to manage things. We might use a tool like Terraform, but we would be writing Makefiles, shell scripts, or Python scripts to run the tool and pull everything together, including configuration.
We’re moving towards an aspiration to standardize that, so a team shouldn’t have to hand-roll all of that stuff. But I don’t think we’re quite there yet. So I talk a bit about some of the tools out there and the different angles and aspects of orchestration they focus on.
Also, things like team structures and team topologies are visual ways of representing team shapes and interactions. That’s important to think about so it all comes together. How do you deploy and deliver it? How do you organize your teams to use it? How do you design it?

Aligning Infrastructure to the Business Value

It’s often very disconnected because it seems quite distant. And people on both sides—business leadership, technology leadership, and people working on infrastructure—are like, “Well, it’s plumbing, right?” It’s basic stuff, and it doesn’t really matter what runs on it; you just build the same servers, databases, and networking structures, and then let somebody else deploy their applications onto it.
Or in the case of business folks, they think, “This is not what our business is. Our business is not infrastructure. It’s whatever it may be, and so we just need stuff that works.” That gap, that feeling on both sides of “we don’t really need to think about this too much,” leads to mismatches.
That manifests as frustration if you talk to business, technology, and product leaders; they are often frustrated to find that infrastructure and environments are a bottleneck and a friction point. Teams don’t have the environments they need quickly enough, or the environments are very messy, or the costs run out of control.
I find a lot of companies looking to expand geographically think, “We’re going to deploy into all these different regions. We use the cloud, so we can just use that.” And then that becomes a big pain point where it becomes a spaghetti of code managing all these different environments and places. Stuff isn’t updated consistently, and you end up having to hire more people to keep stuff running. So it feels like, why didn’t the cloud help? Why didn’t the cloud magically make it so we can just have infrastructure when and how we need it?
The cloud vendors have been very clever. They’ve worked out what the undifferentiated stuff is: servers, subnets, and all these basic-level things. That’s the stuff that you can put behind an API and tell somebody to go and use this API, and the cloud vendors don’t need to worry about what you’re going to use it for exactly.
But the stuff on top of that—how do you put it together? How do I run my application? How do I make updates and configuration changes to the things underneath it?
This is why platform engineering is such a big topic and a recurring theme with new names. We had DevOps, then SRE, then platform engineering, and now developer experience. All of these are ways of pointing at the need to figure out what to put in between—how to make those raw primitives from cloud vendors useful for getting work done without people having to spend loads of time rolling and maintaining things by hand. It’s that gap.
For me, the key is to think about what we are trying to do as a business. Because that means you have to think about things like: Do we expand by growing more infrastructure for each customer? Do they need dedicated infrastructure, depending on the business model? Do we need to think about consolidation?
Thinking about those needs, and then thinking about the day-to-day things a software team has to do. How often do they have to build and deploy new services versus just making changes to existing services? How often are they building entirely new products? Those differences in what people need to do then affect how you are going to design your infrastructure and where you are going to focus. Are you going to focus on being able to spin up new environments for new services very quickly, or is it more about optimization or other things?

Handling the Growing Infrastructure Complexities

When you’re looking at the infrastructure and how to design it, you look at the workloads you’re going to be running and what their requirements are.
One of the ideals is that maybe developers can build their own infrastructure. This is one of the motivations behind tools like CDK and Pulumi, where it’s like, “Let’s give them programming languages they’re comfortable with to work with infrastructure.”
Where that falls down is that the language isn’t really the challenge. It’s maybe a little off-putting to see YAML or some declarative language, but the really challenging thing is understanding the nuts and bolts of the infrastructure. I’ve got to set up a VPC, subnets, network routes, and all these kinds of things.
And that becomes figuring out how to put all that together. Even something as simple as an S3 bucket has zillions of options; you can configure it completely differently depending on what you’re going to use it for. Are you storing sensitive data? Are you doing analytics on it? Is it just storing static content? For each of these different use cases, you would want to configure and tune it differently.
You also need to think about the security policies you should have, governance, and so on. So there’s a lot of complexity. For a developer to have to think about that, it’s going into a lot of detail that maybe you don’t want to think about at that point in time.
One of the things we need to focus on is those interaction modes or abstractions. As a developer, I know I need an S3 bucket. You can probably answer a few questions like, “What am I going to use it for?” “I’m going to use it to store static files for website hosting.” That’s a pretty clear use case. Once you know that, you can then roll out what needs to go into the S3 bucket. Or a few questions about what kind of data might be on there—is it personalized data, is it regulated data, and what are the access patterns?
Providing something for developers to specify in terms that make sense to them and their use case is really useful. Then this stuff will be provisioned for you accordingly. That’s really useful, with the caveat that abstractions can become very dangerous when they hide or stop you from accessing something. It’s about headspace or cognitive context. It’s not about hiding stuff from people, but just saying, “When I need it, I want to be able to get at it. When I don’t need it, I don’t want to have to think about it and spend a lot of time on it.”

The Tools and New Inventions in IAC

Terraform and, for some folks, OpenTofu, are still often the go-to for a lot of people. Something like CDK tends to be platform-specific, so if you’re working with AWS, you might go that way unless you have skills with other tools.
Pulumi is quite interesting, not just because it lets you use different languages to write your infrastructure code, but also because of the toolset and services they’re creating around it, like ESC for environment management, some automation things, and stuff they’re doing with development portals.
That goes back to what I was saying before about orchestration, and that’s what we need to work on next. As much as thinking about the specific tool you’re going to be coding in, you have to think about abstraction, like I was just saying. How are you going to provide that to people?
Being able to create a package, like an S3 bucket package with useful parameters that can be referred to even in application code, is really handy, and then having the ecosystem around it.
There are some things being worked on as next-generation, probably post-code. You have System Initiative, which is led by Adam Jacob who made Chef. Some of the folks behind Weaveworks, Kubernetes, and Google have created a startup called ConfigHub.
Essentially, the premise of these things is that code is a painful way to work with—storing it in Git repositories and all that. Especially with the lag between what’s in your code and what’s reality, and these interim things like state files. When you run the tool, it creates a desired state and does a comparison, and things can get out of sync.
The premise of this new generation of tools is to look at the model—the data structures that represent the real infrastructure—and work with that. It’s almost like taking a Terraform state file and making it live.
Let’s make that the thing. Then you can update the state file or compare it with reality and decide what to do when they deviate. You can work with it through any interface, including code that updates that model, or visual interfaces. Then you can write code that reacts, so it becomes a little more event-driven.
Adam talks about “infrastructure as graph,” I think he said at one point, which is interesting because it’s about the relationships between the different elements. That’s really cool and can be really powerful. And it’s going to completely change how we think about building these things.
It still comes down to, how am I going to wire together all my low-level infrastructure resources into something useful?
It’s quite related to the “infrastructure as data” concept where you’re using the controller in Kubernetes, defining your infrastructure code as CRDs and those kinds of things, so you can just refer to it. And I quite like that as a model conceptually because I can deploy my application and say, “Oh, by the way, my application needs a database instance, it needs some message queues,” and then the system works out, “Okay, now I’ll go and provision that for you.”

Terraform vs OpenTofu

I can certainly see a case for sticking with Terraform, particularly if you’re in that ecosystem with the HashiCorp tools.
The licensing change was a bit unsettling for a lot of folks working out what it means for them. One of my concerns as a consultant was what this would mean for my clients. Most of my clients aren’t affected because they’re essentially end users; they’re not building a service to provide Terraform hosting. The licensing changes were really aimed at those TACOS servers.
My concern as an end user is, what does the ecosystem of tools look like? Am I going to have the ecosystem of tools available to me, or are those vendors not going to provide as much support for Terraform because of the potential cost or liability?
I’m not up on exactly where OpenTofu is in terms of adoption or how widely it’s used. It’s supported by most of the vendors of these third-party tools that were originally the Terraform ecosystem. They’ve diverged and forked—not just the core tool, but the ecosystem is potentially forking and diverging there.
There’s a large ecosystem of vendors and people behind it, which is quite promising. It’s something to experiment with and see how well it suits you.
It will be interesting to see how it does diverge. There was some mixed messaging from the OpenTofu folks originally about whether it would be a drop-in replacement for Terraform. But inevitably, there’s a desire for new features that Terraform wasn’t going to support. For example, things around encrypting secrets—they want you to use the hosted platform, so the OpenTofu folks are like, “Well, we’ll just build it into the client.” So there are a lot of things like that where it is going to start to diverge more and more.
And I think HashiCorp is also building a lot of features into Terraform that integrate with their cloud platform. They really want you to be using the cloud platform, so they’re adding things like Terraform Stacks.
I think people are going to have to pick a horse. Originally, there was this idea that we could just use OpenTofu or switch back and forth as needed, and it would all be compatible. But you can’t do that in the long run. The compelling thing for OpenTofu users will be the things that they can offer more inexpensively than they could get from Terraform.

Orchestrating Infrastructure Changes Using IAC

One thing that’s useful to distinguish is the actual deployment. A lot of the tools out there that help deploy your infrastructure code tend to be very focused on just the deployment part, maybe directly running a plan and then a deploy, which I think is useful and important.
The GitOps thing, or infrastructure as data, where the code and your tool are making the control loop correct drift right away, is one part of the equation.
The other part is, how do you deliver changes to your infrastructure code across environments? Whether those are software testing and QA environments, staging environments, or the environment you use to test the infrastructure code itself before you release it to developers.
We’ve struggled in the infrastructure industry and the infrastructure as code field with how to make that work nicely so we don’t end up with separate copies of infrastructure code for each environment, copying and pasting changes from one repository to another, or even merging branches. Often, it becomes messy.
That’s the part that’s still a little tricky. Some of the orchestration tools that promote changes across environments are addressing that and providing different ways to do it, but it’s still very much in flux. There isn’t the one right way to do it or a pattern that most people agree on. And for most of the clients I’ve worked with, that’s where things are messy.

Platform Engineering

For me, it’s about providing the things that software teams, developers, and also people who do support operations of software need in order to get their jobs done.
It’s about empowering them. This is where we go back to abstraction, because one of the things a platform can do when it goes wrong is disempowering people and making it difficult for them to get their job done. So it really is about empowering as the number one thing.
And then there’s also the thing of sharing implementations and resources. Not every team needs to build their own monitoring toolset. Not every team needs to build their own databases from the low level. What can we share?
And all this has to be in a way that is useful and helpful. It’s not like, “Well, you’re not allowed to use a different tool because we are providing this one, even though it doesn’t do what you want in the way that you need it to work.” It is about what’s truly helpful.
That’s the reason why platform engineering is such a big topic and there’s so much out there about it—because it’s hard. It’s hard to do that well.
One of the big traps is trying to do too much. “I’m going to provide this platform that will do all things for our developers.” It’s too ambitious, and you end up not building a small piece and not having enough, and then people turn elsewhere. Or it’s too restrictive and people can’t get what they need done. There are a lot of traps with that.
The way to succeed at it is to think about it in terms of how to enable other people to build platform services. Rather than saying one team needs to build all the services, it’s like, how can different teams—like the database team that’s an expert on that, and another monitoring team that’s focused on that—build stuff? And how can we provide them with the things to easily connect those things together nicely and manage the things so those teams don’t necessarily need to build everything from scratch themselves? That’s the interesting challenge.

Internal Developer Platform Key Success Factor

The key thing is really focusing on the user. This is where “platform as a product” comes into play.
A pitfall is thinking, “Okay, I’m going to build this platform or this component, and I’m just going to go away and build it because I know what it should be.” “I’m gonna go and build the Kubernetes cluster to rule all the Kubernetes clusters.” But it’s important to really understand how the teams are using it, what they’re using it for, what their journeys are, and what’s common.
Oftentimes, people will optimize for things that don’t happen that often. You’ll see things like, “To create a new server application, we’re going to have the template project. You can go to your developer portal, click a button, and you get a new repository and a new pipeline.” And then you go and talk to the teams, and it’s like, “Yeah, we do that maybe twice a year, but we have these other things that are painful for us.” Let’s look at what’s painful for the teams and what they’re doing very often and try to solve that.
The teams that are most successful in doing the developer platform thing are the ones with really close relationships with the development teams. Often, this stuff comes out of the development teams. They might be embedded with the development teams, working on stuff that just that team is using, and then over time, you realize, “Here’s something that maybe another team needs.”
So you kind of evolve it. It’s reusability rather than starting out by saying, “Here’s a platform service that we’re going to provide on demand,” which is a very ambitious thing to build.
Actually, their first iteration is just, “Well, here’s some documents and outlines of how to build it.” Like, “How do I use RDS to create my database?” We don’t have to necessarily create a wrapper around the RDS service as a platform service, but we can just say, “Here’s how to do it and here’s some example code maybe that you can use and modify.” That’s the first step.
Thinking about those progressions and the evolution of a service’s maturity and how much more polished you can make it as it gets adopted and used more, and you’ve learned more about how teams are using it—that’s important.
It’s that classic incremental, iterative delivery. It’s hard to do with infrastructure and platform services because even if you take a simple application and look at the vertical slice of platform and infrastructure you need for that, it tends to be like an iceberg. You need to build a lot of stuff just to get there.
Monitoring is one of my favorite examples. We’re starting out and we want to use Prometheus, but building a Prometheus cluster and all that is going to take a lot of work. So we might just start with CloudWatch or whatever we can get out of the box. And yes, it’s not as nice to use, but it’ll get us up and running quickly, and the development team can start working.
And then over time, as we get to it, we put that on our backlog, and we’re going to reach a point where it’s like, “Okay, now we really need to do the fuller-featured Prometheus or whatever it may be at that time,” because we’re starting to hit the limits of what we can do. But it’s really hard and challenging to identify that thin slice and to resist the temptation of, “Man, I really want to mess with Prometheus. I want to learn how to set it up.”

Key Considerations of Building Teams with Infrastructure Skills

The answer is that it depends on where the organization is, the size of the team, what they are doing, and their maturity. In the early days, for smaller organizations, it makes sense to have them combined. If you have one “two-pizza” development team working on the software, having a separate team building infrastructure might not be the right approach.
They shouldn’t be developing a reusable platform that is going to serve the needs of a lot of teams. You just need the minimum infrastructure for this application right now. Build it in a way that you can evolve it and replace and swap things out. Use good design and good implementation.
This is where pipelines and automated tests have really helped to make it easier later on as things start to scale to three, four, or five teams. You can’t really replicate infrastructure specialists across all of those teams, and you also have to think about how you can share knowledge and expertise across those teams better. So that’s where you say, “Okay, maybe let’s split some of those into another team.”
It’s really important to think about what the interaction model is there. The best models are ones where you look at what the development team is trying to get done and how they can just get on with it and not have to wait for somebody else or put things in somebody else’s backlog.
What are we doing a lot of now? What are we having to wait for a lot of now? How can we make that easier for people to manage themselves? That’s where you start getting into the more sophisticated things of having something that’s a little more self-service for these tasks.
It isn’t necessarily that one team is building all these things, but it’s like, how does a developer specify “my application needs this?” Teams might be using a platform or framework like Kratix or Humanitec, which are ways to let different teams build and share platform services. Or they might be doing something more in-house.
As an application goes through its pipeline, it’s useful for it to be able to trigger the infrastructure actions it needs. You wouldn’t necessarily want to be sharing static environments; you might want to be able to spin up a minimum set of infrastructure to test what you need to test at that point. Having a way of triggering that from the pipeline becomes useful.
To my mind, a good, mature way of working in pipelines for this is that the infrastructure components have their own pipeline, and you treat it almost like a library or a tool. Here’s the component that builds my S3 bucket, and maybe you have a couple of different components for different types of S3 buckets doing different things.
So you have a pipeline for that infrastructure code to test it in different scenarios and then stamp it and put it into a repository. Then the application teams can pull it into their applications or their pipelines. They then have the ability to provision their own infrastructure and not be held up.
One of the big concerns that often comes up around this is governance. You don’t want all of your development teams building their own S3 buckets because they might do it wrong. They might forget to set the encryption option or just not do it in the best way.
So saying, “Okay, we’re going to have an expert team provide something that has the stamp of approval,” is quite powerful, and then we can validate and show that to compliance, regulators, and auditors if we need to.

Infrastructure Compliance and Governance

There are a couple of places where that can fit in. One is if you have reusable components, you can have tests and validation on that to ensure it passes these checks.
Tests can be run in the application pipelines as changes are going through, again, making sure nothing can happen that shouldn’t and that things are in place. Also, recording who made changes can be really useful there.
You also have the things that you build into your infrastructure at runtime, whether it’s monitoring or other mechanisms that can detect something’s been deployed and raise a flag, like, “Hey, I’m able to access a port I shouldn’t be able to access. What’s going on there?” There are those different layers of protections.
Still to this day for a lot of people, their way of exercising governance and control is still based on manual inspection and gatekeeping of changes. The trick for these people is to understand how to use automation tools like policy as code.
How do you write policy checks? How do you make sure that those are implemented and acting as gates in pipelines and production deployments? How do you make sure the right things are in place and running so that you don’t need to inspect every change to be confident?
These things get you to the point where you actually have more control, and it’s much more rigorous. I just heard the other week about an organization where people were saying, “Well, we can’t automate this task because it’s too sensitive and related to compliance. We can’t trust the automation.” You trust humans instead, you know?
It’s an interesting thing because it kind of flips that trade-off. Rather than saying, “We’re going to go slow and carefully inspect everything to get good governance,” or “In order to go fast, we have to get rid of governance,” you can actually have both.
The automation can run very fast on every change, tell you much more quickly if something is wrong, and tell you where things have gone wrong, so it’s easier to fix. So it can be more rigorous and faster. And that’s the secret sauce that people need to realize is an advantage rather than something to be scared of.

Using AI for Infrastructure as Code

There’s interesting stuff. There’s a whole category of how you can use it on the operations side for troubleshooting and monitoring. But for the actual building and automating of infrastructure, there are a couple of angles.
One is code assistance, as we have with regular programming. Being able to have something advising you, doing auto-complete, and coming up with solutions as you are working in code can be helpful, with all the same caveats as software development. I don’t think you want to vibe-code your infrastructure; you need to understand what’s going on there. You can have something that generates some code, but you really need to understand what it’s doing.
It comes back to that knowledge thing. If you’re using AI to create infrastructure for you, or even create infrastructure code for you, having that knowledge is important to get the levels right.
Looking at this and the tools that do these things, I went back to the book to double-check the principles I talk about in the early chapters. I thought, “Oh, I have to rethink these in light of how you might be using AI,” but a lot of it actually holds up.
The core principles are that you need to be able to do things repeatedly, you need consistency across environments, and there must be transparency so you can understand what’s going on.
This is where I’m less keen on something that is like, “Make me an S3 bucket,” and it does it, and then the S3 bucket is there. I want to see how it’s done, and I also want to know that it can be repeated. I have my S3 bucket in my dev environment; now I want to put it in the QA, staging, and production environments. I don’t want it to come out differently every time, even with the same prompt.
Even if you’re reusing and promoting your prompt, it can still do different things in those different environments.
My picture of how I see it being used is similar to what you might do with a developer portal. Let’s say you have a descriptor as an application developer that says, “I need a secure S3 bucket that I’m going to store personal data in and do analysis on.” It’s at that level of what I care about.
Then something provides that, and that thing is repeatable. So you built a component—maybe you used an AI assistant to help you write the code for it—that will create the S3 bucket. But every time that is executed to create an instance of that bucket, it’s done in a deterministic way. So I know in every environment it’ll build it the same way with the same options.
And for the person using it, say a developer, it might give them coaching. “I’m an assistant that knows your application code base, and I see that you’re storing something. Maybe you should use an S3 bucket or consider a different storage option. Here’s the library of things that have been provided to help you pick between them and understand how to use them.” But you still have those different layers of abstraction and that interface.

Using AI for Troubleshooting and Root Cause Analysis

It could be helpful for sure. As with anything with this stuff, it’s about the data—making sure it has the data available and then getting the prompts and the queries right. And it’s a skill you have to learn how to use well.
I definitely know it’s being used for analyzing logs and metrics to answer questions. It’s still human-in-the-loop.

3 Tech Lead Wisdom

Think about what your role is now and what it is that you need to be doing
- As I’ve changed and grown in my career and roles—first becoming a team leader, then managing multiple teams and team leaders, and now doing a lot of advisory work for teams—it’s really important to think about what your role is now and what it is that you need to be doing, because it is very different.
- That first step to becoming a tech team lead is very different because it’s not up to you to code all the things anymore. You want other people to do that, and you want to get out of their way. And that’s always the hard thing: how to get out of the way of the things that you’re comfortable doing, that you see people doing, and maybe they’re not doing it right or in the way that you would want. How do you step back from that?
Caring about the quality of the code and coaching your developers on that
- And then as you get into managing people who are themselves team leads, again, that becomes different. The things that you’re doing as a team lead, like caring about the quality of the code and coaching your developers on that, you do less of as you move up. You’re more focused on the business—what business problems are we having to solve? How do we go about doing that? How do we coordinate across teams?
- When you’re in an advisory role like me, it’s a bit like I’m stepping aside and noticing things, and it varies a lot by situation. This team needs my help with these aspects, but then another team I’m with has those aspects down pretty good. Maybe one team needs more guidance on making technology or architecture decisions and designing the overall solution because they have less experience with that.
Work on how we communicate what’s going on and bridge that gap
- And now I’m with a team where they’ve got that down, but there are a lot of challenges with stakeholders, so I need to work on how we communicate what’s going on and bridge that gap.
- So that one answer encompasses different things: letting go of what you used to be doing, being attuned to what’s needed now, and where you can bring your strengths and experiences into the mix.

Transcript

[00:01:27] Introduction

Henry Suryawirawan: Hello, guys. Welcome back to another new episode of the Tech Lead Journal podcast. I’m very excited today to have another repeat guest. Today, I have Kief Morris with me. If you are a long time Tech Lead Journal listeners, you probably still remember Kief actually appeared in episode number five. That was almost like five years ago. So it’s been a long, long time since our last conversation. And Kief is back today to talk about the same book that we talked about last time, Infrastructure as Code. But you know what, it’s been in the third edition now.

So when we talked back then, it was in the first edition. The second edition is kind of like up and coming. And now we are in the third edition. So hopefully we are gonna cover a lot of things that have changed in the infrastructure as code world and what are the things, the important things that we can learn from the new book. So Kief, welcome back to the show. Really excited to have you.

Kief Morris: Yeah, thanks a lot, Henry. I’m glad to be here. It was fun being in the early days of the podcast, so it’s really, it’s really cool to be back now that it’s been going strong for so long.

[00:02:39] Updates in the Last Five Years

Henry Suryawirawan: Right. Kief, maybe if you can, uh, just give us a little bit of, I don’t know, like updates, what have you been up to in the last four to five years since our last conversation? Anything interesting that you have been working lately?

Kief Morris: Sure. Um, so yeah, so I’m still at ThoughtWorks. I’ve been and still in London. So I think my role has kind of evolved a bit as the company has grown. I’m now what’s called a distinguished engineer. So basically my role, I get involved in projects. I get involved in talking to potential clients, existing clients, people outside, partners. Basically, it’s all about, for me, it’s all about exploring ideas, finding out about how people are doing things, what they’re learning, and sharing that around. So yeah, I’ve really been enjoying that, ‘cause it’s just given me the opportunity to get involved with a lot of different teams and different situations.

Henry Suryawirawan: Right. So now your title is kind of like Distinguished Infra E ngineer, right? Distinguished Engineer. I know that in some companies they also have these kind of titles. In your view, what actually defines a Distinguished Engineer?

Kief Morris: I think it depends on the company, obviously in the organization. Um, it’s mostly, so the other distinguished engineers are people like Neal Ford and, um, Birgitta Böckeler and James Lewis and a few others. And it would tend to be people who are sharing information externally, whether through books or speaking or, you know, combinations of things. So I think, for us, that’s a big part of it is, it is kind of helping to kind of gather and synthesize ideas and, you know, new ideas as much as possible and sharing it, right? It’s all about, yeah, sharing ideas, learning and sharing.

[00:04:13] Infrastructure as Code Definition

Henry Suryawirawan: Right. It is, again, my pleasure to have another Thoughtworker in the show. So, Kief, let’s start to go discuss about the topics today, right? So infrastructure as code. I know that we talked about it last time, but maybe let’s start with the definition again, right? Because, uh, maybe things have changed. Maybe you have a new definition. Maybe if you can define what is infrastructure as code today.

Kief Morris: Yeah, so I think usually when I describe it, I say it’s something about using tools and techniques and practices and so on from software engineering and applying them to managing infrastructure. Now that’s probably still accurate. I think there’s a lot of interesting stuff going on now if you look at infrastructure management and automated infrastructure management, that kind of as code paradigm, there’s a lot of kind of experimentation on trying different ways that aren’t necessarily directly about code or used in different ways.

So I think there’s something about saying that, yeah, it is about using code as the means for specifying your infrastructure. And yeah, I think that’s, that’s probably a fair definition. I’m, I’m tending to kind of talk a little bit more now and use the phrase infrastructure automation and infrastructure orchestration just to kind of be a little bit more all encompassing of, of maybe some of the other kind of techniques and things that are emerging.

Henry Suryawirawan: Yeah, so when we speak about code, right? So, so many people would associate it to like programming language, you know, write kind of like scripts and things like that. I, I think in the last few years we can see the tools in infrastructure as code mostly are kind of like a yaml, you know. Some, some form of DSL and things like that. So maybe that’s probably where, you know, some of the analogy of code maybe breaks. But although recently there are a lot of tools also written in programming languages. Things like Pulumi, CDK, and all that, right. So probably we’ll talk about the tools later on.

[00:05:58] The Practice of Infrastructure as Code

Henry Suryawirawan: In your experience consulting with clients, you know, you’ve seen different parts of the world, is infrastructures as code now widely adopted or is it still something that is hard for people to practice?

Kief Morris: I think it is. I tried to do a little bit of research on this, but it’s a little bit hard to find concrete numbers. I would say most organizations that are doing infrastructure, especially those using cloud infrastructure are using infrastructure as code tools. Now how they’re using it, how effectively they’re using it is a different matter. But, um, you know, I think it is pretty much the default these days.

[00:06:32] The Differences Between the Book Editions

Henry Suryawirawan: Right. So if you can elaborate a little bit more. You have written three editions of the book, right? Maybe what are some of the major changes? Maybe starting from the first to the second to the third. What are the key themes that you see changing such that you want to write a new edition?

Kief Morris: Yeah. So the first edition was 2016, second edition was 2020. And the first edition, it was like servers was a big part of it. And that was like the subtitle of the book was Managing Servers in the Cloud. And so Chef and Puppet and Ansible and these kind of things were the main thing, even though things like Terraform and CloudFormation had emerged and people were starting to use those. And that was, you know, that was part of, what I talked about in that first edition. Servers had like, you know, multiple chapters covering servers.

And then in the second edition, servers were still in there, but a lot more of the emphasis now is on what are these things you use to orchestrate, you know, cloud type, like multiple resources, including servers in the mix, but like all the kind of other things around it. So again, Terraform, CloudFormation, CDK, Pulumi, those were the kind of tools that were a big part of the focus.

So the third edition, it’s funny, so when I first kind of undertook to do the third edition, I was thinking, well, it probably doesn’t need to change that much, you know, some refresh and updating and all that. But I found that actually there was a lot more to talk about than I’d expected. I think one of the things was around kind of thinking about how to align what we build with infrastructure to kind of business needs and business value and thinking a little bit more about that and bringing that more into it. It’s become… Infrastructure and infra, you know, cloud and all this stuff has become a lot more pervasive. Whereas previously it was kind of something that maybe a, you know, one part of the business was doing or like, you know, startups and these kind of things. And now, it’s become so much more pervasive that I think it’s become really important. And also with kinda like the slowing of growth and so on where it used to be everybody going and building as much as they can and getting everything digital, you know, digital transformations and all that. And now, I think more recently there’s been a lot of thinking about, well, how do we kind of reign it in a bit? How do we consolidate all of the different stuff that maybe we’ve chucked up on the cloud? So that’s been one theme.

Another theme was design of infrastructure and how to design more complex infrastructure states using code. And the third is what you could call kind of orchestration. So this is thinking about if you use Terraform or OpenTofu, you have, uh, these TACOS, this idea of things like, I dunno, Atlantis and Spacelift and Terrateam and loads of different things out there to help you to kind of manage your infrastructure code and deploy it, and update it and all of that.

So I think that’s been a big shift and we’re still kind of underway, I would say in that, it used to be that we would write loads and loads of custom scripts to manage, like to, you know. So we might use a tool like Terraform, but we would be writing scripts, Makefile, shell scripts, Python scripts, whatever, to run the tool to kind of pull everything together and configuration and all of that. And so I think we’re moving towards a little bit more, I think an aspiration to standardize that. So an aspiration that as a team you shouldn’t have to hand roll all of that stuff. But I don’t think we’re quite there yet. So I talk a bit about things like, so what are some of the tools out there and the different angles and aspects of orchestration that they focus on.

Also things like team structures, team topologies, I, you know, make a reference to and use their kind of, um, language and kind of visual ways of representing team shapes and interactions. Cause I think, again, that’s important to think about so it all kind of comes together. How do you deploy it and deliver it? How do you organize your teams to use it? How do you design it? So yeah, so that’s been kind of overall the theme of this third edition is all of those things and how they kind of interrelate.

[00:10:21] Aligning Infrastructure to the Business Value

Henry Suryawirawan: Right. Thanks for summarizing that. So definitely very interesting, those key themes, right? Hopefully we can get to cover most of them, if not all of them, right? So I think one of the interesting thing that you just mentioned is about… the theme about aligning infrastructure as code or infrastructure in general to the business value.

So I think probably this is so rarely spoken about. Typically, people talk about application service, you know, architecture, microservice, monolith to the business value, domain-driven design and all that. But infrastructure, kind of like maybe lesser talked about with the business value. So maybe tell us a little bit what kind of, I don’t know, like misalignment that happens in the industry or what kind of business value that is not driven yet by infrastructure as code.

Kief Morris: Yeah, I think it’s often very disconnected, right? Because it’s seems quite distant. So… and I think there’s an aspiration we tend to want to think about. And people on both sides of like say business leadership, technology leadership, and then people working on infrastructure are kinda like, well, it’s plumbing, right? It’s, you know, it’s basic stuff that doesn’t really matter what it is that runs on it, you just kind of build the same, you know, servers, databases, networking structures, and then let somebody else come in and deploy their applications onto it. Or in the case of like business folks, it’s just like, this is not what our business is, right? Our business is not infrastructure. It’s whatever it may be. And so we just need to have stuff that works.

And so I think that kind of gap, that kind of feeling of both sides that like, well, we don’t really need to think about this too much, leads to mismatches. And that kind of manifests as if you talk to kind of business leaders and technology leaders and say product leaders, often they’re very frustrated by finding that infrastructure and environments and all that kind of stuff is kind of a bottleneck. It’s kind of a friction point. It’s like, well, teams don’t have the environments they need quickly enough, or the environments are very messy. Or the costs run outta control.

Yeah, or things like expansion is a thing. So like I find that a lot of companies are looking to expand geographically and they think we’re gonna deploy into all these different regions. Well, we use cloud so we can just use that. And then that becomes a big pain point where it becomes a kind of spaghetti of code managing all these different environments and different places and stuff isn’t updated consistently, and you end up with more people… having to hire more people to keep stuff running. And so it kind of feels like, you know, why didn’t cloud help? You know, why didn’t cloud magically make it so we can just have infrastructure when we need it and how we need it? And I think, for me, it’s just that there is that kind of piece in between of like, well, how do you assemble?

So I think the cloud vendors have been very clever about they’ve worked out what is the undifferentiated stuff, you know, servers, you know, subnets and. You know, all these kind of like basic level things. Like those are the stuff that you can put behind an API and tell somebody go and, and use this API and it doesn’t matter the people building it, being the cloud vendors don’t need to worry about what you’re gonna use it for exactly. But the stuff on top of that, then, well, how do you put it together? How do I run my application? How do I make updates and configuration changes to the things underneath it?

I think this is why we get, you know, platform engineering is such a big topic and why I think it’s a kind of a recurring theme with new names. We had DevOps and then we had SRE and then we have platform engineering and we have developer experience. And all of these things, I think, are ways of kind of pointing at, we need to figure out what come, you know, what to put in between, how to make that, those raw, you know, primitives that we get from the cloud vendors useful for getting work done without people having to spend loads and loads of time rolling things by hand and maintaining it by hand. So I think it’s that gap.

And so I, for me, the kind of key is to think about, well, what are we trying to do as a business? What, you know, is it again, is it geographical expansion? Because that means do you have to think about something? Do we expand by growing, you know, more infrastructure for each customer? Do they need dedicated infrastructure depending on the business model or not? Do we need to think about consolidation? Is it that we’ve grown so much and we have, you know, 50 different Kubernetes clusters built by different teams in different ways and we need to think about how do we kinda like, you know, reign that in?

So I think thinking about what those kind of needs are, and then thinking about what happens, what’s the kind of day-to-day things of like, what is a, you know, a, software team have to do? How often do they have to build new services and deploy them versus just making changes to existing services? How often are they building entirely new products? And so the, those differences in what people need to do, then affect how are you gonna design your infrastructure? Where are you gonna focus? Are you gonna focus on being able to spin up new environments for new services very quickly? Or is it more kind of optimization or other things?

Henry Suryawirawan: Yeah, definitely very interesting and, makes sense, right? Because, again, it’s kind of like applying engineering practices such that you can kind of like scale much, much faster, right? Rather than having to do it manually one by one or even like error prone, ClickOps, those kind of stuff, right? So I think definitely makes sense, right? So investing in automation, you know, kind of like codifying the infrastructure, definitely is very, very important for business.

[00:15:03] Handling the Growing Infrastructure Complexities

Henry Suryawirawan: So I was kind of like a little bit laughing when you say that, how come even though we have cloud now, things are still kind of like still complex. And I think the trend these days, like there’s so many advancement, you know, maybe from cloud itself, like it used to be like infrastructure as a service, right, so like VMs and all that. Now comes to the cloud native era, like containers, Kubernetes, serverless. But I feel still complex if you kind of like wanna build like a more distributed architecture kind of systems, right? Especially when you have multiple teams working together. Security, compliance and all that comes into place. So with all these new advancement, maybe tell us a little bit more, what kind of complexities are we dealing with? It seems like the technology is supposed to make life easier for us, but how come it’s still kind of like challenging for us, even though we also apply infrastructure as code and things like that?

Kief Morris: I think it’s in how it’s presented. So if you look at it where I think it’s useful to start when you’re kind of looking at the, in what infrastructure and how to design it, you look at the workloads you’re gonna be running on and how are those structured and what are their requirements, and so on. And so I think, what often happens is if you’re looking at that level and like, so one of the kinda ideals, um, that I think we have as well, and maybe developers can build their own infrastructure, right? And so this is kind of one of the motivations behind tools like CDK and Pulumi where it’s like, well, let’s give them programming languages that they’re comfortable with to be able to work with infrastructure.

I think where that kind of falls down is because that the language isn’t, I don’t think, really the challenge. It’s one of the things that’s maybe a little bit off-putting to see, like you say like YAML or whatever, or some declarative language. But I think the really challenging thing is an understanding, you know, those nuts and bolts of the infrastructure. I’ve gotta set up VPC and subnets network routes and all these kind of things. And that becomes, you know, figuring out how to put all that together, even something as simple as an S3 bucket. There’s zillions options on an S3 bucket. You can configure it completely differently depending on what you’re gonna use it for, right? Are you gonna be storing sensitive data? Are you gonna be doing analytics on it? Are you gonna be, is it just storing static content to serve, you know? So each of these kinda different use cases you would want to kind of configure and tune it differently. You also need to think about how to kind of, the security policies you should have and governance and so on.

And so there’s a lot of complexity. And so for a developer to have to think about that, it’s going into a lot of detail that maybe you don’t wanna have to think about at that point in time. So I feel like one of the things that we need to focus on is those kind of interaction modes or those, say, abstractions. So as a developer, I know I need an S3 bucket. Probably a few questions you can answer as to like what am I gonna use it for? I’m gonna use it to store static files for website hosting. Okay, that’s it. Pretty clear use case. Once you know that, you can then kind of roll out, okay, here’s what needs to go into the S3 bucket. Or a few questions about, again, what kind of data? Is it personalized data, is it regulated data and so on that might be on there? What are the access patterns?

So I think providing these things, you know, providing something for developers to be able to just kind of like specify in terms it makes sense to them and that makes sense to the use case, you know, and they care about. And then say, fine, you know, this stuff will be provisioned for you accordingly. I think that’s really useful. With the caveat by the way that when we talk about abstractions that can become very dangerous when there’s something that hide or like stop you from accessing it. So to my mind, it’s about kind of headspace or, you know, cognitive context. So it’s like, okay, I’m working on my application. I care about this level of stuff. At some point, it might be, okay, now I’m finding and see the performance isn’t what I need it to be. I need to be able to then go and look down and drill into that. I need to be able to do that. So it’s not about hiding stuff from people, but just kind of saying when I need it, I want to be able to get at it. When I don’t need it, I don’t want to have to think about it and spend a lot of time on it.

Henry Suryawirawan: Yeah, thanks for adding that plug, right, because abstraction can be dangerous. In my experience last time also, like for example, I, there’s a Kubernetes platform, but we are not exposed to any of the kubectl, you know, or any kind of a Kubernetes native way of working. So I think that kinda like defeats the purpose, right? But I think an abstraction with some kind of like an opinioned way is something that probably can help organizations, especially when they grow larger, right? Because if everyone is doing the same thing in the different flavors, definitely, it’s gonna be difficult.

[00:19:10] The Tools and New Inventions in IAC

Henry Suryawirawan: So you talk a little bit about the tools itself, right? So obviously when we hear about infrastructure as code, many people associate, you know, things like Terraform, maybe now kind of like CDK or Pulumi are kind of like into the trend as well. But what do you think are kind of like some of the go-to tools that people are using these days for infrastructure as code? And what are the, some of the new inventions in infrastructure as code that people should know about?

Kief Morris: Yeah. Okay. This is, this is an interesting one, yeah. So I think Terraform and, for some folks, OpenTofu, are still often the go-to for a lot of people. Probably, the one I’m most familiar with. We look at things like CDK, that’s gonna tend to be like platform specific. So if you’re working with AWS, you might go that way unless you’ve got kind of skills on the other tools. I think Pulumi is quite interesting, not just because of the languages and the fact that it lets you use different languages to write your infrastructure code. But I think the kind of tool set and services that they’re creating around it, so things like ESC for environment management, and some of the automation things and some of the stuff they’re doing with kind of development portals. That goes back to what I was saying before about orchestration and I think that’s where a big, that’s what we kind of need to work on next, right?

So I think as much as thinking about the specific tool, you know, that you’re gonna be coding in, if you think about like what I was just saying about abstraction, so how are you gonna provide that to people, right? What options do you have? So being able to kind of create a package, S3 bucket package with the parameters that’s useful or that can be referred to even in code, application code, I think that’s really handy. And then having the ecosystem around it. So that’s one of the reasons why, and I’ve, you know, I’m talking about Pulumi. I talked with those guys a lot and I’m doing a couple of events with them. So that’s kinda like, I guess, a disclaimer of yeah, I, that approach is I think quite useful and quite interesting.

And then I think there’s some kind of things coming out that are being worked on as kind of next generation, probably a post-code. So you have the System Initiative which is led by Adam Jacob who made Chef and a few other people. There’s a similar ish, you know, conceptually, some of the folks behind Weaveworks and Kubernetes itself and Google, have created a startup called ConfigHub, which I know less details of. The Systems Initiative stuff, I have had a chance to play with. It’s, you know, they’ve got an open beta. But essentially, these premise of these things are like, code is kind of a painful way to work with, storing it in your Git repositories and all that kind of stuff. And that kind of, especially that kind of lag between, okay, you’ve got what’s in your code, what’s the reality, and then these kind of interim things that you might have state files or you might have, you know, when you run the tool then it kind of creates a desired state and does comparison and things can get out of sync.

And so I think the premise of this kind of new generation of tools is to say, let’s look at the model, the data structures that represent, you know, the real infrastructure and work with that. It’s almost like if you take on, say a Terraform state file and you say, let’s make that live. Let’s make that the thing. And so then you can kind of update the state file, you know, or compare it with what’s in the reality and decide what you wanna do when they deviate. And then you can kind of work with it through whatever interfaces, including code. You can write code that then updates that model. Visual interfaces. So certainly, system Initiative there kind of the thing that you see there is a kind of drag and drop interface. And then you can write code that reacts so it becomes a little bit more event driven. And so Adam talks about graph, I think he said infrastructure as graph, I think he said at one point. Which is interesting because it’s those relationships between the different elements.

And so I think all that’s really cool. And I think, you know, that can be really powerful. And it’s gonna change completely how we think about, you know, building these things. I think it still needs to, so in itself, that still comes down to, well, how am I gonna wire together all my low level resources of infrastructure into something useful? And so I think it’s really important still to have, and this is when every time I see Adam I kind of go on about this, needs to have that kind of, you know, whether it’s a component model or whatever it is, to help people who understand the guts, you know, the nuts and bolts to build these kind of components that then somebody else can make use of. You know, again, the developers can say, gimme a couple of those and here’s the parameters I want. So I think it’s gonna be really important, it’s still gonna be really important to have that. That’s my hobby horse these days is like I think that’s what we really need to focus on.

Henry Suryawirawan: So I haven’t heard those two. So System Initiative and ConfigHub, so definitely people can check them out. It sounds to me a little bit more like how the way Kubernetes works as well. I mean it’s kinda like declarative, right? And you have like, I dunno, like a control loop that keeps on checking whether the state that isn’t…

Kief Morris: Yeah, it’s quite related to that infrastructure as data thing where you’re using the controller, you know, in Kubernetes, defining your infrastructure code as like CRDs and those kind of things, so you can just refer to it. And I quite like that as a model conceptually of like, because, again, I can deploy my application and say, oh, my application by the way, it needs a database instance, it needs some message queues or whatever. And then the system kinda works out, okay, now I’ll go and provision that for you. And so I think, yeah, I think you’re right. I think it’s a natural kind of next step beyond that really.

Henry Suryawirawan: So definitely, uh, the next thing to check out, right, for people who like to… Because I know that infra people, whenever they hear new tools, they will check it out and, you know, see how does it work, how cool that is, and things like that.

[00:24:11] Terraform vs OpenTofu

Henry Suryawirawan: So speaking about Terraform, right? So I think still it’s the biggest, you know, majority people still use kinda like Terraform and now we have OpenTofu as well, and they kind of like diverge, I think has been a while since the fork and the diverge of the version. So what is your take here, for people who are, I dunno, still using Terraform mainly, right? So should they stick with Terraform or should they change to OpenTofu? What is your view?

Kief Morris: Yeah, I think it’s an interesting one. I mean, I can certainly see a case for sticking to Terraform, particularly, if that’s if you’re in that kind of ecosystem with the HashiCorp tools and so on. You know, the licensing change was a bit unsettling for a lot folks and working out like, what does this mean for me? I think one of my concerns with the licensing change was that it would impact, so even if you as a user, so this was as a consultant like I was concerned with, okay, what does this mean for my clients? Well, most of my clients aren’t affected because they’re essentially end users. They’re not building like a service to provide kind of Terraform hosting. So those kind of TACOS servers and those kind of things are the one really that I think, licensing changes were aimed at.

But I think it has an indirect impact of, or, you know, my concern is, well, what’s the ecosystem of tools look like as a end user still? It’s like am I gonna have the ecosystem of tools available to me or are they, those vendors gonna be maybe not providing as much support for Terraform because of the potential cost or liability or whatever they may have from that? So I think OpenTofu, I’m still, I’m not kind of up on exactly where OpenTofu is at in terms of adoption. Like how widely is it used? It’s supported by most of the vendors of these kind third party tools that were originally kinda the Terraform ecosystem and have kind of, as you say, they’ve kind of diverged, they’ve kind of forked, not just the, core tool, but like the ecosystem is potentially, you know, forking and diverging there.

So I think there’s a lot of vendors and people and, you know, an ecosystem there that is behind it. And so I think it’s quite promising from that. I would kind of like say it’s something to experiment with and look at and see how well it suits you. It’ll be interesting to see how it, how it does diverge, right? Because I think one of the kind of, I think there was a little bit of mixed messaging from the OpenTofu folks originally around, well, is it gonna be kind of like a drop in replacement for Terraform.

But I think, inevitably, what’s happened is like, well there’s a desire for new features, for maybe features that Terraform wasn’t gonna support. Things around kind of encrypting secrets where it’s like they want you to use the hosted platform and so they OpenTofu folks are like, well, we’ll just build it into the client. So there’s a lot of things like that where it is gonna start to diverge more and more. And HashiCorp as well I think are building a lot of features into Terraform that integrate with, you know, their cloud platform. They really want you to be using the cloud platform, so things like Terraform Stacks and so on.

And so this is, I, you know, I, I think people are gonna have to pick a horse. Originally there was kinda this idea, hopefully, we can just kind of like use OpenTofu or switch back and forth or whatever we need to do. And it’ll all be compatible. But I think I don’t think that’s gonna, you know, it can’t do that in the long run. And really that’s gonna be the, I think, the compelling thing for the Open Tofu users is gonna be the things that they can offer for maybe more, um, inexpensively or what have you than they could get from Terraform.

Henry Suryawirawan: Yeah. So definitely something to watch out, right? So over the time, so I’m still using Terraform by the way as well.

Kief Morris: So am I, for most of my either small projects that use it mostly, and whatever my clients use. And most of my clients are still more Terraform, the CDK. I can’t think of having certain, I haven’t worked in a project where it’s OpenTofu. Not yet.

[00:27:38] Orchestrating Infrastructure Changes Using IAC

Henry Suryawirawan: Yeah. So one thing you mentioned just now as well is about the orchestration or kinda like deployment, right? So like applying the infrastructure changes, right? So definitely the trend that you picked is kind of like orchestration tools and things like that, maybe in a way that applying like CI/CD pipeline or kind of like GitOps model for applying infrastructure changes. So what trend do you see around this? So, because I think many people still in different maturity, right? Some kind of like just applying it from their local. Some apply it in a CI/CD pipeline or maybe some other things. So what do you think people should do if they wanna start IAC the right way?

Kief Morris: Yeah, I think it’s important. So I think one of the things that’s useful to kinda distinguish is the actual deployment. So I think a lot of the tools that are out there that like we’re gonna help, you know, deploy your infrastructure code and all that kind of thing, they tend to be very focused on just the deployment part and maybe directly running a plan and then a deploy or those kind of things, which I think is useful and important. So the GitOps thing or the infrastructure as data or what have you, where there’s like the code and then your tool is making a, you know, the control loop kind of make sure that, you know, drift happens and is, or drift is corrected right away. So that’s one part of the equation.

And then I think the other part is, well, how do you get, how do you deliver changes to your infrastructure code across environments where you have, whether those are environments to test software, testing and QA environments, and staging and so on. Or the environment you use to test the infrastructure code itself before you put it out to developers. So I think it’s something I think that we’ve struggled with in the infrastructure industry and infrastructure as code kind of field, is how to make that work nicely so that we don’t end up with essentially separate copies of infrastructure code for each environment and kind of copying and pasting changes from one, say one repository to another. Or even merging branches. Often, you know, it becomes messy.

So I think that’s the part that’s still a little bit tricky in that, you know, again, some of those kind of tools out there that do the orchestration, you know, in promoting across environments, some of those are addressing that and providing different ways to do that. But I think it’s still very much in flux in kind of like this, there’s not like an established, okay, this is the one right way to do it, the pattern that most people agree on. And so I think that’s where things tend to get and, you know, most of the clients I’ve worked with, that’s where things are messy.

Henry Suryawirawan: Yeah, so I could imagine like these kind of things definitely is kind of like the details of implementation that differs across organization, right? So some people like to, I dunno, create modules that can be reusable, but you know, like modules itself has to evolve, right? Especially if you make some changes. And then how do you backport it to existing infrastructure and things like that. So that’s always kind of like difficult. And writing tests, right, for infrastructure as code. I’m not sure how many people do that diligently. Because like there will be resources that gets created or maybe if you use some kind of, I don’t know, like mock things like that, maybe that could happen. But ideally, it also never tested in the real world, right? So I think definitely these kind of things tends to be the challenge for people.

[00:30:35] Platform Engineering

Henry Suryawirawan: And you mentioned about platform engineering, right? So I think these days everyone is talking about platform engineering. What is platform engineering in your view? Because I think this term, kind of like DevOps and all that, right, being used interchangeably to describe certain things, especially from vendors, right? So maybe in your view, what is platform engineering?

Kief Morris: Yeah. I think for me, it’s about providing the things that software teams and developers, and also people who do support operations of software. What do they need in order to get their jobs done? And particularly… So it’s A, it’s about making their, you know, easier for them, empowering them. And this is where we go back to like we were talking earlier about abstraction. Because, you know, one of the things a platform can do when it goes wrong is disempowering and making it difficult for people to get their job done. So it really is about empowering as a number one thing.

And then there’s also that thing of kind of sharing implementations and resources. So like, not every team needs to build their own monitoring tool set. Not every team needs to kind of build their own databases from the kinda low level. What can we share? And all this has to be like in a way that is useful and helpful, you know. So it’s not like, well, you’re not allowed to use a different tool because we are providing this tool, even though it doesn’t do what you want and the way that you need it to work and so on like that. It is about what’s truly helpful.

And I think that’s the reason why platform engineering is such a big topic and there’s so much out there about it, ‘cause it’s hard. It’s hard to do that well. I mean, it’s so easy to fall into traps or I think one of the big traps is you try to do too much, right? So you wanna, I’m gonna provide this platform that will do all things for our developers, and it’s, you know, it’s too ambitious. And so you end up not, you know, building a small piece, small piece and not having enough, and then people turn elsewhere. Or you’re just, you know, it’s too restrictive and people can’t get what they need done. And so like there’s a lot of kind of traps, I think, from that.

So to me, the way to kind of succeed at it is to think about it in terms of how to enable other people to build, say, platform services. So rather than saying one team needs to build all the services, it’s like, well, if different teams or the database team that’s an expert on that, and another monitoring team that’s focused on that, how can they build stuff and how can we provide them with the things to kind of easily connect those things together nicely and to kind of manage the things that, again, those teams don’t necessarily need to build everything from scratch themselves. Like what do they need? And so I think that’s the interesting challenge.

Henry Suryawirawan: Yeah, and especially the underlying platform itself evolve, right? So let’s say if you use cloud as the basis, right? So the cloud itself also change. New features, new changes, new deprecation. So definitely supporting those things are kind of like difficult as well.

[00:33:06] Internal Developer Platform Key Success Factor

Henry Suryawirawan: And I think when you mention about building platform engineering, right? So some people kind of like also associate this with like internal developer platform or something like that. So the idea is kind of like build tools and, you know, services and platforms such that other people can use it. Maybe in the team topologies is like enabling team, platform team as well in a sense, right? So have you ever seen, you know, success stories of people building such team and capability? And if so, what do you think are the key recipes for them to succeed?

Kief Morris: Yeah, I think, I mean, I have, and I think, I think the key things are really focusing on the user. So this is where the kind of platform as product comes into play. So again, pitfall is, okay, I’m gonna build this platform or this component of a platform, whatever, and I’m gonna just go away and build it cause I know what it, what it should be, right? So I’m gonna go and build the Kubernetes cluster to rule all the Kubernetes clusters, and I’m gonna go away and build that. But I think it’s important to really have that understanding of how are the teams using it and what are they using it for, and what are their journeys and what’s common.

So you see oftentimes people will optimize for things that don’t happen all that often. So you’ll see like, okay, we’re gonna make this way so that it’s easy. We’re gonna have like a, just to create a new application, right? A new server application. And so we’re gonna have the, you know, the template project. You can go to your developer portal and click a button and you get a new repository created in a new pipeline and all that. And then, you know, you look at it and you go and talk to the teams and it’s like, yeah, we do that maybe twice a year. But we have these other things that are painful for us and like, you know, and so it’s, okay, let’s look at what’s painful for the teams and what they’re doing very often and try to solve that.

And so I think the teams that are most successful in doing the kind of platform, you know, developer platform or whatever you call it, wanna call it, is that just that really kind of close relationships with the software teams, the development teams. Often they’re, these stuff comes outta the development teams. So they might be embedded with the development teams and working on stuff that is a, just that team is using. And then over time, okay, here’s something that maybe can get another team needs.

And so you kind of evolve it. It’s reusability rather than trying to start out by here’s a, you know, a feature that we’re gonna provide, a platform service that we’re gonna provide on demand and easy click, which is a, again, a very ambitious thing to build. And actually, their first iteration is just, well, here’s some documents and outlines of how to build it so that you know, like, okay, how do I use RDS to create my database, right? We don’t have to necessarily create a wrapper around the RDS service, you know, as a platform service, but we can just kinda say, here’s how to do it and here’s some example code maybe that you can use and modify. That’s kind of the first step. And so I think thinking about, you know, those progressions and evolution of maturity of a service and how kind of more polished you can make it as it gets adopted and used more and you’ve learned more about how teams are using it. I think that’s important.

Henry Suryawirawan: Yeah. So thanks for mentioning, you know, platform as a product definitely is kind of like the way, the go-to way to build proper platform, right? And yeah, and the other message is about, you know, don’t be that ambitious, right? So sometimes, we, you know, we engineer likes to build, you know, gold plated solutions. You know, one click everything automated.

Kief Morris: It’s that classic incremental, iterative delivery. It’s hard to do with infrastructure and platform services because like, even if you take a simple application and you look at, okay, what’s the vertical slice of platform and infrastructure you need for that? It tends to be, you know, it’s like an iceberg, right? You need to build a lot of stuff just to get there.

So I’ll often do things like starting with… like monitoring is one of my favorite examples. It’s like, okay, you know, we’re starting out and like we wanna use Prometheus. But like building a Prometheus cluster and all that kind of stuff is gonna take, you know, a lot of work. So we might just start with CloudWatch or whatever we can get out of the box. And yes, it’s not as nice to use, but it’ll get us up and running quickly. And the software team, the development team can start working. And then over time, as we get to it, we put that on our backlog. And we’re gonna reach a point where it’s like, okay, now we really need to do the fuller featured, you know, Prometheus or whatever it may be at that time, might even change, because we’re starting to hit the limits of what we can do.

So yeah, but it’s hard. It’s really hard and really challenging to identify that thin slice. And then, yeah, as you say, to resist that temptation of like, man, you know, I really wanna mess with Prometheus. Wanna learn how to set it. Sure, I can do it. At least, in my weekend.

[00:37:15] Key Considerations of Building Teams with Infrastructure Skills

Henry Suryawirawan: Yeah. Resume-driven development, right?

So speaking about application, right? So I think this is also one key consideration, right? So when you build a team, right, do you have infrastructure engineer also together as part of the team building the kind of like infrastructure as code together with the software features and things like that? Or do you have a separate team? And do you actually run the CI/CD pipeline together, application and infra, or are they different pipelines? So maybe these are some opinions that, uh, we can learn from you as well.

Kief Morris: Yeah, but I think the answer is that it varies, it depends, maybe where the organization is, the size of the team and what they are doing and the kind of maturity. So I would say, in the early days, for smaller, um, organizations, it makes sense to have them combined, right? So having, if you have like one development team working on, you know, a handful two pizza team or whatever you wanna call it. So one team working on the software development. Having a separate team building infrastructure might not be the right approach. Because they’re not developing, they shouldn’t be developing a reusable platform that is gonna serve the needs of a lot, and, you just need, what’s the minimum infrastructure we need for this application right now? Build it in a way that you can evolve it, um, and replace it and swap things out. So, you know, use good design and good implementation. And so this is where, yeah, pipelines and automated tests I think have really helped to kind of make it easier later on as that starts to kind of like, okay, now we’ve got three teams, four teams, five teams developing things. And becomes, you know, you can’t really kind of replicate infrastructure specialists across all of those teams. And also think about how to, you know, how can we be sharing knowledge and expertise across those teams better and so on. So that’s where you say, okay, maybe let’s kinda split some of those into another team. And there, again, I think it’s really important to think about what’s the interaction model there. So even with the smaller organizations, we have a few teams. I think still, you know, high, you have opportunity to do high communication kind of stuff.

And so I think the best models are ones where when you look at like what is the development team trying to get done? How can they just kind of get on with it and not have to then wait for somebody else or put things in somebody else’s backlog or what have you? And so you look at, you know, you’re just looking for those things. Okay, what are we doing a lot of now? What are we having to wait a lot of now? How can we make that easier for people to manage themselves? And so I think that’s where you start getting more into the kind of more sophisticated things of, right, we’re gonna have something that’s a little bit more self-service for these tasks, or we’re gonna start making more tasks self-service again, as it gets bigger. It’s like, you know, more things get folded into that. And where that, you know, going back to what I was saying before on platform teams they should be doing, again, it isn’t necessarily that one team is building all these things. But it’s like, okay, so what is that motive? How does a developer specify my application needs this? And so you’ll see like, teams might be using a, you know, a platform or framework, something like Kratix or Humanitec or what have you, where it’s like, okay, these are kind of ways to let different teams build platform services and share them in interaction. Or they might be doing something in-house a little bit more simply…

Like the pipelines, I think it’s interesting. I think, as an application goes through its pipeline, it’s useful for it to be able to trigger the actions that it needs in terms of infrastructure. And so particularly where you wanna get to as, you know, as you have a lot going on, is it’s like you wouldn’t necessarily wanna be kind of sharing environments or having static environments. You might wanna be able to kinda spin up infrastructure, a minimum set of infrastructure to test what I need to test at this point. And so having a way of triggering that from the pipeline, saying my application needs this, can you, you know, and it gets provisioned, that I think becomes useful.

And so where I kind of, to my mind, a good kind of mature way of working in pipelines for this is the components, the infrastructure components have their pipeline and you treat it almost like a library or a tool or what have you, where is, okay, here’s the component that builds, you know, my S3 bucket that I keep talking about, right? So here’s the component for that. Bring a couple of different components for different types of S3 buckets doing different things. And so you have a pipeline that as that infrastructure code, test it, try, you know, in different scenarios and then kind of stamps it and puts it into a repository within the application, teams then can pull it, you know, into their applications or into their pipelines and so on. So they then have the ability to kind of provision their own infrastructure, not be held up.

But you also have another, one of the big concerns that often comes up around all this, these things is governance. You don’t want your, all of your development teams kind of building their own S3 buckets because they might do it wrong. They might forget to set the encryption option or what have you, or just not do it in the best way. And so saying, okay, we’re gonna have an expert team provide something that, you know, has the stamp, it has passed the kind of, you know. And then we have, we can validate and show to, um, compliance, you know, regulators and auditors, if we need to. That’s quite powerful.

[00:41:56] Infrastructure Compliance and Governance

Henry Suryawirawan: Yeah. I think it’s also a good segue. You brought up the topics about governance, right? So definitely, an extension of infrastructure as code. So many people talk about, you know, shifting left, you know, security. Right now, people talk about policy as code, compliance as code, everything becoming a code now. Governance is definitely very hard and challenging, right? There are so many things that we could change and that there’s so many things that are insecure and maybe less in terms of compliance, in terms of organization policies, right? So what is the approach here that you would advise people in terms of creating infrastructure but still kind of like govern and compliant to the standards that organizations have?

Kief Morris: Yeah, I think there’s a couple of places where that can fit in. And so like, one of it, as I say, if you have components that are reusable and you can have kind of tests on that and validation that like, yes, it passes these things. That’s a good point. Tests that can be run in, in the application pipelines as changes are going through, again, making sure, you know, nothing can happen that shouldn’t happen. Things are in place. Also recording changes who made changes and all those kind of things can be really useful there. I think you also have the things that you build into your kind of infrastructure at runtime or, you know, whether it’s monitoring or other kind of mechanisms that can detect something’s been deployed. Hey, you know, I’m able to access a port. I shouldn’t access what’s going on there, right, raise a flag. I think there’s those kind of different layers of protections and so on.

I think a key thing is like, for the people, so the, you know, the experts, say your security experts, your governance experts and so on. Even people in other kind of areas of expertise like performance and so on. I think a lot of, you know, still to this day for a lot of people, their way of exercising their governance, the control and all that, is still based on kind of manual inspection and gatekeeping of changes. And I think the kind of the trick for these people is to understand how to use the automation tools. Again, you know, the, should I say policy as code? How do you kind of write policy checks? How do you make sure that those are implemented and that those are acting as gates in pipelines and production deployments and where else they need to? So how do you make sure that the right things are in place and to make sure that they’re running so that you don’t need to inspect every change to make sure it’s compliant and you have the kind of confidence? And I think, you know, these things get you to the point where it’s like you actually have more control and more, you know, it’s much more rigorous, the old ways. And I still hear, I heard just the other week, an organization where the people were saying, well, we can’t automate this task because it’s too sensitive. It’s too compliance and stuff. It’s… We can’t trust the automatedness. Like, you, you trust humans instead, you know?

So yeah, I think it’s just that mindset thing of just like, you know, this stuff can make for much more. It’s actually, it is interesting thing because it’s kind of flips that trade off rather than saying, well, you know, we’re gonna go slow and carefully inspect everything and that’s how we’re gonna get good governance. Or else if we get rid governance so we can go, you know, in order to go fast we have to get rid of governance, we have to loosen up on it. Actually, you can have both, because the automation can run very fast. It can run on every change, tell you much more quickly if something is wrong, tell you much more quickly where things have gone wrong. So it’s easier to kind of fix. And so it can be more rigorous and faster. And I think that’s the kind of secret sauce that people need to, you know, need to realize is an advantage rather than something to be scared of.

Henry Suryawirawan: Yeah, and I think this space also kind of like advances quite rapidly, right? So security, compliance and all that. And I know there are a lot of tools that can do even automated scanning, right? So sometimes when you wanna govern, right? It doesn’t necessarily mean you have to do it like a gate keeping check, you know, like check whether this can go through or not. Sometimes you can detect kind of like drift to your policy or maybe violations to some rules or compliant things, right? And you kind of like detect and raise that to the team. I think that can also work sometimes, right? Because all I find, you know, with engineers these days, there are so many things that people need to understand and know, and it is very, very challenging to understand everything, right? So I think having these kind of tools that gives you the signals, the clues will definitely be helpful.

[00:45:53] Using AI for Infrastructure as Code

Henry Suryawirawan: So I was kind of like laughing as well when you mentioned like, those people trust human instead. These days there’s an AI element, right? So what about AI in infrastructure as code space? So what do you see the trends coming?

Kief Morris: There’s interesting stuff. I mean there’s a whole category of how you can use it in operations side, in troubleshooting and monitoring and all that, which I think is very interesting. But for the actual kind of building of infrastructure and automating it, I think there’s a couple of angles.

So one is, so I think the code assistance, like as we have with regular programming, um, being able to have something whereas you are working in a code is kind of advising you doing auto complete and coming up with solutions to things, can be helpful with all the caveats that covers software development there. It’s like, you know, I don’t think you wanna vibe code your infrastructure. Um, I think you wanna, you know, you want to, you need to understand what’s going on there. Yeah, you can have something that generates some code, but you like, you really need to understand what it’s doing.

And so I think it comes back to what I was, you know, again, what I was saying earlier around, like there’s that knowledge thing. And so I think if you’re using AI to kind of create infrastructure for you or even create infrastructure code for you, you still, you know, having that knowledge, you know, it’s important to get the levels right. And so I think there probably still is. So there’s a couple of things. So one thing I’ve seen people experimenting with is using kind of chat bots to say like, okay, provision me that S3 bucket, and then it provisions it for you. And I think that the challenges there are the same things we have with, you know, goes back in infrastructure.

And so I was, I was looking at this and tools that do these kind of things, and now is… I went back to the book, right, already like the, you know, third edition just came out. But I wanna like, double check like the things that I said in the early chapters of the book where I talk about principles. I was like, oh, I have to rethink these in the light of how you might be using AI in these, right? Part of it actually hold up. Those early level, you know, the core principles are things like you need to make sure that you can do things repeatedly, you’ll have consistency in, you know, across environments. And that there’s transparency so you can understand what’s going on.

And so this is where I’m kinda less keen on something that is like, make me an S3 bucket and it does it, and then the S3 bucket is there. Like you know, I wanna see how it’s done and I also wanna know that, like, it could be repeated. So, okay, I have my S3 bucket in my dev environment. Now I’m gonna wanna put it in the QA environment, the staging environment and production. I don’t want it to kind of come out differently every time because, even with the same prompt. Even if you’re reusing the prompt and promoting your prompt, ‘cause you could do that, right? Put a prompt in the code and then, you know, mark down file or whatever, and promote that. It can still do different things in those different environments. And so I think there’s something about, my kind of like picture of how I see it being used in this, as a part of this, is kind of similar to what you might be doing with like a developer portal or with these things that I’ve mentioned.

Let’s say you have like a descriptor as an application developer that says, I need a secure S3 bucket that I’m gonna store, you know, personal data in and do analysis on. And so it’s that level of like, okay, what do I care about? And then something that then, okay, I’m gonna provide that. And so that thing is a repeatable thing. So you built a component, maybe you used an AI assistant to help you to kind of write the code or whatever it is that is a component that will create the S3 bucket. But every time that that is executed to create a, you know, an instance of that bucket, it’s done in a deterministic way. So I know in every environment it’ll build it the same way and the same options. And then for the person using it, it might be, say a developer, it’s like it might give them the kind of coaching things. It might say, hey, I noticed you’re using, you know, I’m looking at your, you know, I’m an assistant that knows you application code base, and I see that you’re storing something. Maybe you should use an S3 bucket. Or maybe there’s a different storage thing, you know, option that you should consider. And here’s the library of things that have been kind of provided, you know, to help you pick between those and understand how to use them. But still having that kind of different layers of abstraction, I dunno if that’s quite the right thing, but that interface, right, and, you know, all that.

So I think it’s exciting, right? I don’t think stuff is there, like even for the coding assistants, most of the, the a, you know, the AI coding assistants for that work with application code aren’t great.

Henry Suryawirawan: Yeah, so I, I have that experience as well. So even though I use Terraform, which I assume like many open source projects are having Terraform code, right? Still kind of like doesn’t perform as good as programming languages like typical JavaScript or Java and all that. So I think there’s a like gap in terms of using coding assistant. And I don’t know whether we can even vibe code infrastructure as code, you know, like just, you know, ask AI to keep on creating resources, you know, spin up, spin down and all that, right? Because also, there’s a risk, right? If it’s misconfigured and if you have data loss, I think it’s a big challenge for the organization.

[00:50:31] Using AI for Troubleshooting and Root Cause Analysis

Henry Suryawirawan: So apart from coding assistant, probably people use it for, I don’t know, like you mentioned, monitoring, support and all that. Are these tools advanced enough now that you can actually use it to kind of like troubleshoot, figure out the root cause analysis and things like that?

Kief Morris: I think it could be helpful for sure. And it’s, as with anything with this stuff, it’s around the data, you know, make sure it has the data available and then getting the kind of prompts right and the queries right. But yeah, I know, and it’s a skill also like again with you have to learn how to use it well. But yeah, I definitely see people using it. And we’ve got people in ThoughtWorks in our demo group, which is about operations, where they’re working on some kind of an agent based system that pulls this stuff together and they’re like, you know, even having stuff where it might be able to generate like a pull request to correct an issue and all of that. I don’t know how, I’m not as close enough to say I, you know, is that how much of that’s being used in anger and all of that. But I, you know, I definitely know it’s being used for things like, again, those kinda logs and metrics and all that and, you know, to answer questions. It’s, you know, human in the loop, still.

Henry Suryawirawan: Yeah, so definitely always involve humans still. So Kief, we have talked a lot about the new stuff in infrastructure as code. Is there anything that left out maybe that you think we should cover as well?

Kief Morris: Um, I think we’ve covered a lot. I’m sure there is, but I think we’ve covered all of the cool stuff.

[00:51:50] 3 Tech Lead Wisdom

Henry Suryawirawan: Right. So I have one last question for you. I think if you still remember back then, I asked you this same question, right? I call this the three technical leadership wisdom. So maybe if you can think of it just like an advice that you wanna give to the listeners. What kind of three tech lead wisdom that you wanna share for us today?

Kief Morris: Yeah. Okay. I don’t know if I, I don’t know if I have it as like three items, but I think it’s kind of a broad thing. So one of the things I’ve found as I kind of change and grow in, in my career and roles and how I relate to. So like, you know, first, there’s that like becoming a team leader and then there’s becoming maybe somebody who’s managing multiple teams, managing team leaders. A lot of what I’m doing these days is kind of advisory to teams. So I’m in teams in a very senior role and as a kind of a leadership role, but I’m not necessarily in that kind of line of command. I think it’s really important to think about what your role is now in the, and, you know, what is it that you need to be doing? Because it is very different, right? It’s very different that that first step of going to being a, a team lead, a tech team lead, is now very different because it’s not up to you to code all the things. You want other people to do that, and you wanna get outta their way from that. And that’s always the hard thing, right? How to get out of the way of the things that you’re comfortable doing that you see people doing and maybe they’re not doing right or they’re not doing in the way that you would want or whatever. And how do you kind of step back from that? That’s a thing.

And then as you get into the, where you’re managing people who are themselves, team leads, again, that becomes different. And the things that you’re doing as a team lead of like, you’re caring about the quality of the code and you’re coaching your developers on that. As you move up, it’s like, okay, I’m doing less of that now, right? And I’m, you know, I’m more focused on, you know, maybe again, it’s that kind of business. You know, what is it, you know, we’re looking for in terms of kinda business problems that we’re having to solve? How do we go about doing that? How do we coordinate across teams? That becomes a thing.

When you’re in kinda an advisory role like me, it’s a little bit like I’m stepping aside and I’m kind of noticing things and noticing, okay, this is the thing. And it varies by situation a lot, right? Okay, this team needs my help for these aspects. Oh, and then now another team I’m with, they’ve actually got those aspects down pretty good. Like so, you know, maybe it’s making the technology decisions or the architecture decisions and designing the overall solution. You know, maybe one team needs more guidance on that because they have less experience with that. And now I’m with a team where they’re, they’ve got that down, but it’s like, well, there’s a lot of challenges with stakeholders and so I need to kind of work on, okay, how do we communicate to the stakeholders what’s going on and how do we kind of bridge that?

So I think it’s kind of like one answer that I think encompasses, there are different things in there of like letting go of what you used to be doing and being attuned to what, what’s needed now, and where you can, where you can bring your kind of strengths and experiences into the mix.

Henry Suryawirawan: Wow! Thank you for such a great elaboration of this, uh, you know, intermixed, I would say intermixed kind of attributes that you need as a leaders. So I think that’s really important.

So Kief, if people would like to contact you, ask you about the books, or maybe ask you follow up questions, is there a place where they can reach out to you?

Kief Morris: Yeah, so, um, I’ve got my website infrastructure-as-code.com with the dashes in between the words. And there’s like kind of contact stuff on there. Probably LinkedIn is the main social that I’m on these days. And so yeah, you can kind of ping me there. I’m occasionally on Bluesky and stuff, but not as active.

Um, I tend to be more in Slacks these days, more kind of community, you know, technical community Slacks seems to be in as some of the big social media sites that I used to do a lot, I’ve kind of pulled back from. So now it seems to be more the, the smaller communities, which is an interesting phenomenon, but a topic for another day maybe.

Henry Suryawirawan: Right. So always a pleasure, uh, learning from you, the new stuff about infrastructure as code. I think you are like one of the forefront in this area. So definitely thanks for this conversation and yeah, goodbye for now.

Kief Morris: Okay. Thanks a lot. Appreciate it.

– End –