#108 - Building the Future of Cloud Engineering With Pulumi - Joe Duffy

10-Oct-2022 50 mins Joe Duffy

included in DevOps Infrastructure Cloud Automation

“Companies that are successful in getting the most out of the cloud embrace the fact that distributed application architecture is a first class application architecture concern."

Joe Duffy is the co-founder and CEO of Pulumi. In this episode, we discussed cloud engineering concept and how Pulumi is helping to shape its future. Joe started by sharing his story founding Pulumi and the evolution of the cloud adoption. He shared his view on why cloud should be a first class application architecture concern and the concept of cloud as an operating system. Joe then shared in-depth the concept of cloud engineering as the next evolution of DevOps and explained how it changes the way we build, deploy, and manage infrastructure and application in the product development lifecycle. Towards the end, Joe shared his view on the future of cloud engineering and how Pulumi is helping organizations adopt cloud engineering at scale.

Listen out for:

Career Journey - [00:06:18]
Innovation at Scale - [00:07:55]
Jumping into Entrepreneurship - [00:09:23]
Founding Pulumi - [00:11:05]
Cloud as First Class Concern - [00:13:00]
Cloud as a Giant Super Computer - [00:14:59]
Cloud Engineering - [00:17:27]
Next Stage of DevOps - [00:19:34]
Build Phase - [00:23:02]
Programming Language for Infrastructure - [00:24:58]
Importance of Tools Ecosystem - [00:26:22]
Deploy Phase - [00:29:48]
Advanced Deployment Automation - [00:32:47]
Manage Phase - [00:35:43]
Compliance as Code - [00:37:00]
Infrastructure as Software - [00:38:32]
Kubernetes-Based Infra as Code - [00:40:22]
Future of Cloud Engineering - [00:43:11]
Pulumi Customer Story - [00:44:52]
3 Tech Lead Wisdom - [00:46:30]

_____

Joe Duffy’s Bio
Joe Duffy is co-founder and CEO of Pulumi. Prior to founding Pulumi, Joe was a longtime leader in Microsoft’s Developer Division, Operating Systems Group, and Microsoft Research. Most recently, he was Director of Engineering and Technical Strategy for developer tools, where part of his responsibilities included managing the groups building the C#, C++, Visual Basic, and F# languages. Joe created teams for several successful distributed programming platforms, initiated and executed efforts to take .NET open source and cross-platform, and was instrumental in Microsoft’s company-wide open-source transformation. Joe founded Pulumi in 2018 with Eric Rudder, the former Chief Technical Strategy Officer at Microsoft.

Follow Joe:

Twitter – @funcOfJoe
LinkedIn – linkedin.com/in/joejduffy/
Website – JoeDuffyBlog.com
Email – joe@pulumi.com

Mentions & Links:

Pulumi – https://www.pulumi.com/
Infrastructure as Code – https://en.wikipedia.org/wiki/Infrastructure_as_code
DevOps – https://en.wikipedia.org/wiki/DevOps
Team topologies – https://teamtopologies.com/
Virtual private cloud – https://en.wikipedia.org/wiki/Virtual_private_cloud
Async/await – https://javascript.info/async-await
Kubernetes – https://kubernetes.io/
kubectl – https://kubernetes.io/docs/reference/kubectl/
HashiCorp – https://www.hashicorp.com/
Spinnaker – https://spinnaker.io/
Github Actions – https://github.com/features/actions
GitLab – https://about.gitlab.com/
Jenkins – https://www.jenkins.io/
GitOps – https://www.gitops.tech/
Chef – https://www.chef.io/
Puppet – https://puppet.com/
AWS Cloud Development Kit – https://aws.amazon.com/cdk/
Terraform CDK – https://www.terraform.io/cdktf
Crossplane – https://crossplane.io/
Kubernetes Config Connector – https://cloud.google.com/config-connector/docs/overview
Bitbucket – https://bitbucket.org/product
FaunaDB – https://fauna.com/
Microsoft Azure – https://azure.microsoft.com/en-us/

Our Sponsor - Founders Wellbeing

Mental well-being is a silent pandemic. According to the WHO, depression and anxiety cost the global economy over USD 1 trillion every year. It’s time to make a difference!
Learn how to enhance your lives through a master class on mental wellness. Visit founderswellbeing.com/masterclass and enter TLJ20 for a 20% discount.

Our Sponsor - iSAQB SAG 2022

The iSAQB® Software Architecture Gathering is the international conference highlight for all those working on solution structures in IT projects: primarily software architects, developers and professionals in quality assurance, but also system analysts who want to communicate better with their developers. A selection of well-known international experts will share their practical knowledge on the most important topics in state-of-the-art software architecture. The conference takes place online from November 14 to 17, 2022, and we have a 15% discount code for you: TLJ_MP_15.

Our Sponsor - DevTernity 2022

DevTernity 2022 (devternity.com) is the top international software development conference with an emphasis on coding, architecture, and tech leadership skills. The lineup is truly stellar and features many legends of software development like Robert "Uncle Bob" Martin, Kent Beck, Scott Hanselman, Venkat Subramaniam, Kevlin Henney, Allen Holub, Sandro Mancuso, and many others!
The conference takes place online, and we have the 10% discount code for you: AWSM_TLJ.

Our Sponsor - Skills Matter

Today’s episode is proudly sponsored by Skills Matter, the global community and events platform for software professionals.
Skills Matter is an easier way for technologists to grow their careers by connecting you and your peers with the best-in-class tech industry experts and communities. You get on-demand access to their latest content, thought leadership insights as well as the exciting schedule of tech events running across all time zones.
Head on over to skillsmatter.com to become part of the tech community that matters most to you - it’s free to join and easy to keep up with the latest tech trends.

Our Sponsor - Tech Lead Journal Shop

Are you looking for a new cool swag?

Tech Lead Journal now offers you some swags that you can purchase online. These swags are printed on-demand based on your preference, and will be delivered safely to you all over the world where shipping is available.

Check out all the cool swags available by visiting techleadjournal.dev/shop. And don't forget to brag yourself once you receive any of those swags.

Like this episode?

Follow @techleadjournal on LinkedIn, Twitter, Instagram.

Buy me a coffee or become a patron.

Buy me a coffee

Quotes

Innovation at Scale

The amazing thing at the scale of Microsoft is they have many, many technologies and many products. Some of them make a lot of money. Some of them don’t make any money. Some of them are bets on the future. Some of them are more tactical solutions to address shortcomings in the market. You also have the Microsoft research arm where you’ve got a ton of really smart people trying to live out 10 years into the future.
I worked on engineering teams of 5 people, engineering teams of 500 people, and everywhere in between. And so you kind of learn a lot of lessons that the things that work for very small teams or very early stage technologies may not be the same tactics you use for later stage technologies. Like in the early days, you’re still trying to figure things out. You’re willing to take more risk. You’re willing to do things that don’t scale. As you become much more of a mature product, those dynamics changes.

Jumping into Entrepreneurship

I don’t think you start companies just to start a company. You really need to have the right opportunity. You really need to be passionate, excited about going and making some sort of change in the world.

Founding Pulumi

Pulumi is an interesting opportunity to really reshape how people build cloud software in a more fundamental way.
What struck me is the cloud was still sort of treated like an afterthought. It was very similar to in the 2000s, or even earlier than that, where people racked in stack servers, you wrote simple three tier applications, didn’t really think about the infrastructure. It was always the infrastructure team dealing with that. And meanwhile, the infrastructure teams I talked to, they were kind of like, “Hey, nobody’s given us the sort of developer love that people have spent for application development, you know, great IDEs, test frameworks, powerful languages.”
With Pulumi, the idea is the cloud’s not an afterthought anymore. And all developers are building cloud software because the cloud is basically powering all the modern software that we see today. Now seemed like a great time to say we’ll bring the cloud closer to developers, but also give infrastructure teams amazing technology, and really help them be more productive and bring more joy to their lives.

Cloud as First Class Concern

10 years before we founded Pulumi, and a lot of the same patterns recur, where if you take a step back, we’re really in the cloud, we’re building distributed applications. We’re finally at the age of distributed programming.
The companies that are most successful in getting the most out of the cloud that are building amazing things, they embrace the fact that this distributed application architecture is now a first class thing that we can incorporate into applications. You don’t get there by treating the cloud as a collection of virtual machines. You get there by really thinking about these application patterns. How do we connect these different systems? How do they speak to each other?

Cloud as a Giant Super Computer

Think about back in the day with C and C++. You’re programming directly to the operating systems API. And then over time, we came up with NodeJS and Ruby and more productive ways of programming and they sort of abstract away a little bit of the operating system. But the operating system is all about controlling access to hardware and scheduling operations against that hardware. That’s the job of an operating system.
The role of the cloud is it’s a piece of software that manages a lot of different hardware and controls access to it and schedules access against it. And so if you use that analogy, we’re sort of missing that NodeJS of the cloud. We’re missing these higher-level abstractions.
A lot of folks talk about Kubernetes. I think of Kubernetes as really just the kernel scheduler or the thread scheduler for the cloud, but it doesn’t stop there. There’s so much more to be built on top of that foundation.
Although Pulumi lays this great programmable foundation, I think the next phase is these higher-level experiences that have yet to be built.

Cloud Engineering

One, bringing the cloud closer to application developers, so that the cloud is part of the software engineering process.
Two, bringing software engineering tools and techniques to infrastructure teams to help tame some of the complexity of the modern cloud, which is how do you program the supercomputer? Will you do it with software engineering techniques?
- We don’t think of it as an afterthought or something that is lots of copy-and-paste. We think that bringing all the decades of improvements in software engineering applies just as much to the cloud as it does to regular application development on classic operating systems.
Third, helping break down the barrier between developers and infrastructure teams, so we can just collaborate on equal footing to build great software together.
- You can think of it as the evolution of what comes after DevOps. In many ways, DevOps was more about bringing some dev techniques to ops teams, but not as much the opposite, which is really having developers thinking about the cloud as part of the application architecture. And I think you need to do both in order for that cross-pollination to happen.

Next Stage of DevOps

We merged the teams, but we made several mistakes when we did that, which was we didn’t recognize the importance of preserving those unique skillsets for people that really love testing. They think very creatively about how to find bugs and put in place proactive automation.
We need to make sure the same thing doesn’t happen, which is infrastructure is a specialty. Not every developer is going to want to learn how to properly configure a multi-region Kubernetes cluster that has high availability and is cost efficient and all the secured network concepts there as well. So we need to preserve all of those domain experts, but we need to combine that with the ability to use software to tame complexity at scale.
What we often see is specialization is a good thing, and at scale you need to specialize.
A very common model we see within our customers, and this is mid-scale, large scale, is the platform team. So the platform engineering team is typically a team that’s in between the developers and the sysadmins operators, and folks that are spending all their time on infrastructure. The platform team typically takes a software engineering mindset, but it’s a hybrid of engineering expertise and infrastructure expertise. And their goal is to build the automation for the surrounding organization to make everybody productive, so they can ship code faster, but also do it securely within compliance, within budget, and that sort of thing.

Build Phase

We sat down to solve the problem, and the problem to us was, honestly, building modern cloud applications was not so great at the time. I mean, you wanted to stand up just a simple containerized service that was five boxes on the whiteboard. Next thing you know, you’re like knee deep in 4,000 lines of YAML, and that didn’t feel great. Especially coming from a developer experience background.
What we did that’s different from most tools is we let you bring your favorite language to express your infrastructure as code. By doing that, we tap into the ecosystems around these languages. IDEs, test frameworks, package managers, the ability to share and reuse common patterns, and really give a great developer experience.

Programming Language for Infrastructure

We want the right tool for the right job. And sometimes YAML or JSON is a perfectly fine tool for the job at hand.
Pulumi is a multi-language platform. I think the insight we had was how do we get all the rigors of infrastructure as code? Being able to preview before a deployment, getting full audit history of who changed what and when. All these elements that we love about infrastructure as code, but take those, then also give you your choice of language.

Importance of Tools Ecosystem

It starts simple, but it gets complex very quickly, and I think we invented programming languages to help with complexity as we scale.
And I think Pulumi really is infrastructure as code that scales. And the reason it scales is because of the languages. And languages give you the ability to encapsulate complexity where it’s not needed. Abstract away concepts into higher-level concepts, so we can build bigger things out of smaller things. We would’ve never gotten to where we did on the application development side if we didn’t have these facilities. And now the ability to apply this on the infrastructure side is very powerful.
The simple things get complicated. The reason why it’s so complicated is it doesn’t have for-loops, and so for every availability zone you have to copy and paste and then rename the thing. Not only that, but you can stick it in a package, and now give somebody a one-line way to spin up a properly configured VPC. And given that everybody in the world has to do that, just imagine worldwide how many human hours that’s going to save just being able to do that.

Deploy Phase

Moving to a code centric model for how you do deployments is definitely key to cloud engineering and something we see a lot of our customers wanting to do, but struggling to do. Most people are triggering deployments of their applications through Git workflows today. Some are still doing them manually, but regardless, all the code is in source control. People are struggling on how to trigger the deployments.
The more modern cloud architectures, the line gets a little bit blurry. The line becomes blurry for a lot of the infrastructure and many of our customers want to enable developers to self-serve with guardrails. They want to make sure that developers don’t shoot themselves on the foot from a security standpoint.
The D in CI/CD is deployment. We think of that as not just application changes, but infrastructure changes as well. That’s a key thing that a lot of people are trying to get to, and a lot of people are struggling to get there. And then once you have that, you want to do verification with testing and security and everything just built into how you’re doing deployment. Same thing for application, same thing for infrastructure.
So if you’re detecting a theme, it’s that we don’t need to bifurcate how our entire team operates between apps and infrastructure. We can really share a lot of the same engineering practices.

Advanced Deployment Automation

We definitely see a lot of advanced use cases amongst pretty standard workflows in our customers. For example, one of our customers has a hundred production regions that they deploy into, or a hundred environments, let’s say. And they want to orchestrate that over the course of two weeks. Because they want to roll out to one region, make sure things are okay. Start with a canary. Ratchet that up to a 100%. Monitor all the metrics, and only graduate between environments as things are healthy, and then automate the rollback if things start going south.
We have something called the automation API, which is effectively, what if infrastructure as code was just a library that you could program against instead of being a CLI that a human had to run or that you stuck into CI/CD.
CI/CD systems today are not great for infrastructure. They work, but like, as you say, there’s something missing there, the major missing pieces. Most CI/CD systems think of the world as in like 10, 30 minute or worst case, one hour byte size chunks. Many infrastructure activities can span, like days and weeks. Sometimes they have manual approval steps because somebody needs to sign off on budget or provision, something via a ServiceNow ticket. So it is more complicated.
We see a lot of folks doing GitOps. Mainly because they want to be able to do a deployment based on Git branch merges and pushes, and it fits in nicely with a lot of the application life cycle. You can basically just use pull requests as approval and review, just like you would a code review for application code changes. So it does go nicely hand in hand with infrastructure as code.

Manage Phase

They want to find what’s running in production. Who did a deployment? Why did something change? Is it secure? How do we start to tame that chaos?
We really wanted it end-to-end. It starts from build. You deploy it, but you’re not done there. You want things like drift detection. You want things like the ability to go and search over your resources to find something when something goes wrong. We also have the same way GitHub tells you all the code changes that are happening and who made the changes and what the diffs were. You want to be able to do that for your infrastructure as well. Both for just staying on top of the rhythm of the team, but also in the event that something goes wrong.
For us, it doesn’t stop with, okay, the infrastructure’s up and running. Once you get it up and running, now you need to version it and manage it and evolve it. And so that end-to-end lifecycle is really important as well. That’s part of cloud engineering, and honestly, that’s where a lot of the infrastructure experts in a lot of the operations and sysadmins play a more significant role.

Compliance as Code

There’s a phrase we like to use, which is secure by construction. You want to find issues before they go into production. And so policy as code helps you enforce policies before you’ve actually done the deployment. Actually, it’s part of the deployment.
For us, it begins even before that. The static type checking, the fact that you can run linters–you know, if you want to use Pylint, you can use Pylint or ESLint or you can encode your own policies into that. That’s even before the policy as code, so you really get these layers of defense to catch issues before they get out into production.
You still want to find issues after the fact that have already gotten out into production. But, ideally, you want to catch them before it’s too late.

Infrastructure as Software

We say infrastructure as code, but it’s really been infrastructure as text until just recently. Actually, early in the use of the phrase infrastructure as code, it actually was more code than it is generally these days.
At some point, we decided that infrastructure as code meant JSON, YAML, or domain specific languages that were very limiting. It’s not just the code. It’s about the whole software engineering practice you have around that code.
Infrastructure as software to us is really just applying software engineering practices and treating infrastructure like it is software. You think about modules. You think about sharing and reuse. You think about how we architect these things. It’s not about whipping out a piece of code to like script our way out of it. It’s really about setting back and thinking about this as building software.

Kubernetes-Based Infra as Code

By the way, on the Ruby thing, sometimes people say, “Oh, well, with full-blown languages, you can create a messy code base”. And let me tell you some of the messy code bases I’ve seen for CloudFormation and Terraform. And by the way, that same argument applies to application development. We optimize for value.
When I think of Pulumi, it’s really three things. One, it’s a programming model for how you express cloud architectures. Two, it’s a deployment engine that can do infrastructure as code deployments. And the third is it’s a cloud engineering platform that helps you accomplish everything we talked about earlier. You know, the CI/CD, testing, policy as code, the whole end-to-end. It’s about how do your operational infrastructures operationalize infrastructure as code.
I think of Crossplane as a very interesting alternative to that second thing, it’s still all the YAML. And it’s not solving for any of the third things, which is all the software engineering, cloud engineering topics that we talked about. So it’s really just a deployment technology that can sit inside Kubernetes clusters and deploy to other clouds.
What we see out in the world is almost all of our customers, if they’re using Kubernetes, it’s really what I mentioned earlier, it’s not the center of the universe. It’s really the scheduler for the cloud.

Future of Cloud Engineering

We’re still really just getting started. For us, we are really excited about leading the innovation agenda for infrastructure as code and really showing the world that there’s a new way of doing this.
We just laid the foundation. We gave this nice programmable surface area on top of the cloud. You think of everything we talked about cloud operating system. What is that missing application model for the cloud? How do we get to the world where the three or five boxes on the whiteboard are easy to express rather than decomposing into hundreds of building block services?
There’s a missing layer in between. And so I think, for us, we’re really focused on what is that next layer of abstraction on top of this foundation. And that will be much more for application developers who aren’t necessarily focused on infrastructure as code. It really should be 10 times easier to build a modern cloud application.
Adopting cloud engineering is not push a button and we have cloud engineering, and it really, really should be. There’s so many teams out there having to build their own custom platforms. And again, having the ability to extend the system, like with our automation API is great. But wouldn’t it be even better if you didn’t have to start from scratch every single time in every single company? So I think that’s the third major area is really helping these organizations adopt cloud engineering at large scale.

3 Tech Lead Wisdom

When in doubt, just solve important problems for your customers, and you really can’t go wrong.
- It’s easy to overthink things and build grandiose technologies, but ultimately, your customers will keep you honest. If you just solve their problems, that’s the easiest path to success.
- Throughout my career, I’ve often gotten interested in shiny objects and built things just for sheer pleasure, and that’s fine sometimes too.
- But if you’re starting a company, I think just solving customer problems is really the name of the game. Not just on day one, but also as you scale and grow as well. Like just listen to your customers and they’ll tell you what they need.
Good enough is never good enough.
- I think always relentlessly pursuing a higher bar, really compound interest, you know, 1% better every single day compounds into something huge. In fact, you know, 0.1% better every day compounds into something huge.
- It’s easy to accept the status quo. We’re not going to get there by slowly iterating and just accepting the status quo.
Really dream big, but be realistic about what you can attain in a finite period of time.
- If you’re iterating and getting 1% better every day, then you’ll probably get there, eventually.

Transcript

[00:02:30] Episode Introduction

Henry Suryawirawan: Hello, my friends and my listeners. Welcome to the Tech Lead Journal podcast, the show where you can learn about technical leadership and excellence from my conversations with great thought leaders in the tech industry. If this is your first time listening to Tech Lead Journal, subscribe and follow the show on your podcast app and on LinkedIn, Twitter and Instagram. And if you’d like to support my journey creating this podcast, subscribe as a patron at techleadjournal.dev/patron.

My guest for today’s episode is Joe Duffy. Joe is the co-founder and CEO of Pulumi. If you haven’t heard about Pulumi, it is a universal infrastructure as code SDK and platform to deliver infrastructure with high velocity and scale through software engineering. In this episode, we discussed cloud engineering concept and how Pulumi is helping to shape its future. Joe started by sharing his story founding Pulumi and the evolution of the cloud adoption. He shared his view on why cloud should be a first-class application architecture concern, and the concept of cloud as an operating system. Joe then shared in-depth the concept of cloud engineering as the next evolution of DevOps and explained how it changes the way we build, deploy, and manage infrastructure and application in the product development life cycle. Towards the end, Joe shared his view on the future of cloud engineering and how Pulumi is helping organizations adopt cloud engineering at scale.

I hope you enjoy my conversation with Joe. If you do, please help share it with your friends and colleagues who can also benefit from listening to this episode. My ultimate mission is to spread this podcast to more listeners, and I really appreciate your support in any way towards fulfilling my mission. Before we continue the conversation, let’s hear some words from our sponsors.

[00:05:43] Introduction

Henry Suryawirawan: Hello, everyone. Welcome back to another new episode of the Tech Lead Journal. Today, I have with me someone named Joe Duffy. He’s the founder of Pulumi. So if you haven’t heard about Pulumi, it’s one of the Infrastructure as Code tools which tries to solve how to create infrastructure through code or software. In this case, later we will discuss about it. So Pulumi is just one of many options available out there. Today we’ll be talking a lot about cloud engineering and Infrastructure as Code and things related to that. So Joe, really looking forward for this conversation. Thank you so much for your time today.

Joe Duffy: Yeah. Likewise. Thanks for having me Henry.

[00:06:18] Career Journey

Henry Suryawirawan: Joe, I always like to start my conversation with my guests by asking them to share about their career journey, any kind of highlights or turning points in your career.

Joe Duffy: Yeah, it’s a great question. I sort of break my career maybe into three phases. You know, very early on, I got the love of coding in my system when I was a teenager. That was right around the time the internet was starting. The first turning point was probably, you know, I decided to start my own consulting business to help companies with the transition to the internet. That actually planted the seed for eventually starting a larger company. Consulting business, you have to be jack of all trades. You have to do a little bit of sales, a little to marketing, a little bit of product engineering, a bit of support. That really taught me a lot of good lessons there.

Second phase, you know, I went to Microsoft, and that was a really formative time in my career. That was when I really saw innovation at scale, at the scale of Microsoft, and got to work on a lot of cool technologies, products, met some of the most brilliant engineers I’ve ever worked with in my career and probably ever will work with, and got to move into leadership positions and manage large teams at scale and really see what that looks like. Then at the end of that, I helped open source .NET and take it cross platform. And that really reconnected me with my roots, sort of in the open source community.

And then the third phase was, you know, I decided to leave Microsoft after 14 years or so, and started a company. Not just a consulting business, but actually a product company and build that and scale it. And that’s been a great five years. So I’m the CEO of the company. In the early days I was writing all the code in my basement and then we’re now around a hundred employees. So it’s been a great learning experience and a great journey.

[00:07:55] Innovation at Scale

Henry Suryawirawan: Thanks for sharing that. So I was interested when you say innovation at scale. So maybe for people who are interested as well, what do you mean by seeing firsthand how innovation is done at scale?

Joe Duffy: Very good question. Because that’s the thing that kept me at Microsoft for so long. I was always learning, always seeing new approaches to solving problems, getting new opportunities, working on different technologies. The amazing thing at the scale of Microsoft is they have many, many technologies and many products. Some of them make a lot of money. Some of them don’t make any money whatsoever. Some of them are bets on the future. Some of them are more tactical solutions to address shortcomings in the market. You also have the Microsoft research arm where you’ve got a ton of really smart people trying to live out 10 years into the future. And then a lot of folks trying to figure out, okay, let’s take some of those ideas and incorporate them into the products.

I worked on engineering teams of five people, engineering teams of 500 people, and everywhere in between. And so you kind of learn a lot of lessons that the things that work for very small teams or very early stage technologies may not be the same tactics you use for later stage technologies. Like in the early days, you’re still trying to figure things out. You’re willing to take more risk. You’re willing to do things that don’t scale. As you become much more of a mature product, those dynamics changes. And so it is really fun at Microsoft. You get to see the entire landscape. Sometimes it’s a joke, you know, Microsoft is sort of like 250 startups at one company, and that always kept it exciting.

[00:09:23] Jumping Into Entrepreneurship

Henry Suryawirawan: Right. And you spend your time there for about 14 years, you mentioned, right? That’s pretty long. I think it’s just another interesting thing that I picked up. So what made you decide to start entrepreneurship? Because 14 years in one company, good company like Microsoft as well. So what made you start entrepreneurship, going into like open source roots that you mentioned? Maybe something interesting to share here.

Joe Duffy: It is a very good question. I think when I started that consulting company early on, I knew that there was more to business than just writing code or just building technologies. Although that’s sort of the thing I initially gravitated towards because, you know, writing code is great. You can just like sit down and make something out of nothing. But I always knew that I wanted to get back to more end-to-end business, and I think a combination of a few things.

At a large company like Microsoft, despite all the things that I said are great about it, it’s sometimes hard to really understand, am I having a business impact? It’s sometimes hard to be even in the room where critical business decisions get made. Especially, if you’re working on a product team. I got reasonably senior over that period of time. I always felt I was still one to two levels removed, like unless I was in Satya, you know, the CTO’s staff meetings, I didn’t feel like I was really going to have the level of impact that I was looking for. And literally every year that I worked there, I asked myself, is this the year to leave and start a company?

I don’t think you start companies just to start a company. You really need to have the right opportunity. You really need to be passionate, excited about going and making some sort of change in the world, and so finally, stars aligned. But the thing is, you know, the whole time I was there, I’m still reading business books, reading SEC filings, very nerdy things to do, but really staying on top of the entrepreneurship seed that was planted very early on.

[00:11:05] Founding Pulumi

Henry Suryawirawan: Very interesting. Which brought you into founding Pulumi, right? Why Pulumi? What kind of trends do you see back then? Maybe it’s also a good segue to our actual conversation later on about the cloud and evolution and things like that. And why Pulumi?

Joe Duffy: I think Pulumi is an interesting opportunity to really reshape how people build cloud software in a more fundamental way. Just before leaving, I was managing all the languages groups at Microsoft and helping with open source and .NET. I really lived and breathed developer productivity on a daily basis. And what struck me is the cloud was still sort of treated like an afterthought. It was very similar to in the 2000s, or even earlier than that, where people racked in stack servers, you wrote simple three tier applications, didn’t really think about the infrastructure. It was always the infrastructure team dealing with that. And meanwhile, the infrastructure teams I talked to, they were kind of like, “Hey, nobody’s given us the sort of developer love that people have spent for application development, you know, great IDEs, test frameworks, powerful languages.”

And so, with Pulumi, the idea is the cloud’s not an afterthought anymore. And all developers are building cloud software because the cloud is basically powering all of the modern software that we see today. Now seemed like a great time to say we’ll bring the cloud closer to developers, but also give infrastructure teams amazing technology, and really help them be more productive and bring more joy to their lives. Similar with developer productivity over the last few decades.

Henry Suryawirawan: Yeah, you bring up a very interesting point that when we started with the cloud, right? Yeah. I mean maybe some people, especially some traditional companies will think of it, yeah, it’s just like another data center, right? So instead of us, maybe some people managing data center for us, we outsource it to the cloud and it’s just like another infrastructure. So this kind of evolution, I think, is something very interesting. I see it myself as well. Where previously it’s like when people call DevOps, right? So it’s like dev and ops. Different silos. These days, there are some movements to bring them together.

[00:13:00] Cloud as First Class Concern

Henry Suryawirawan: And yes, infrastructure tools have always been something very unique, I would say. There are some innovations there about introducing software, but cloud brings the whole game into totally different level. You can program many things, right, on the cloud. You mentioned in one of your blog that cloud is actually now a first class application architecture concern. So tell us more about this term, first class application architecture concern.

Joe Duffy: Yeah. So another part of my background in the 2000s, I worked on multi core and parallel programming and bringing async into all these languages. So now you have async await. Back then we didn’t have that, so we created the task framework in .NET, and this honestly led to async await in .NET. Now you see that in every language, right? This concept.

But nobody had done a similar thing for cloud. I actually worked on a distributed operating system while I was at Microsoft, and this was before containers, but it was really about how do we build large scale distributed applications? How do we configure them? How do these different services talk to each other? And funny enough, that was like, 10 years before we founded Pulumi, and a lot of the same patterns recur, where if you take a step back, we’re really in the cloud, we’re building distributed applications. It’s now, finally, we’ve been talking about it in research for 50 plus years. We’re finally at the age of distributed programming.

And the companies that are most successful getting the most out of the cloud that are building amazing things. You think about Uber, Spotify, database themselves, they embrace the fact that this distributed application architecture is now a first class thing that we can incorporate into applications. You don’t get there by treating the cloud as a collection of virtual machines. You get there by really thinking about these application patterns. And that’s what I meant by that. How do we connect these different systems? How do they speak to each other? Back in the early days, we called that service discovery. We had enterprise service buses and like all this stuff with J2EE. It’s like very similar patterns, just sort of cast in a new light.

[00:14:59] Cloud as a Giant Super Computer

Henry Suryawirawan: Yeah. And the follow up thing that you mentioned, which I found very interesting in your blog. You mentioned with all these distributed apps coming, cloud with its own technologies, not just providing infrastructure, CPU, memory, and all that, but also more new products. Things like containers, serverless, and things like that. You mentioned that now cloud has become like a giant super computer. So maybe it’s interesting to dig deeper here. Why do you think cloud is a giant super computer? How do you get the analogy from?

Joe Duffy: Yeah. The funny thing is you think about what is an operating system and you think about how do you interact with your operating system in typical programming models? Think about back in the day with C and C++. You’re programming directly to the operating systems API. And then over time, we came up with NodeJS and Ruby and more productive ways of programming and they sort of abstract away a little bit of the operating system. But really the operating system is all about controlling access to hardware and scheduling operations against that hardware. That’s the job of an operating system. Well, what is the role of the cloud?

The role of the cloud is it’s a piece of software that manages a lot of different hardware and controls access to it and schedules access against it. And so if you use that analogy, we’re sort of missing that NodeJS of the cloud. We’re missing these higher-level abstractions. In fact, we’re sort of still in the days of C, maybe even assembly, but let’s give a little bit of benefit of the doubt. It’s more like C, but what are those higher-level programming model concepts that we’re sort of missing today?

And I think a lot of folks talk about Kubernetes. I think of Kubernetes as it’s really just the kernel scheduler or the thread scheduler for the cloud, but it doesn’t stop there. There’s so much more to be built on top of that foundation. The exciting thing is we’ve now at least agreed on some of the foundational pieces, like, hey, Kubernetes is probably here to stay and we know how to schedule containers. And so now we can build these higher-level services. Although Pulumi lays this great programmable foundation, I think the next phase is what are these higher-level experiences that have yet to be built?

Henry Suryawirawan: When you mentioned Kubernetes, I mean, it all started for scheduling like containers and things like that. While in the cloud, there are still so many other things like the VMs, the serverless, the databases, and things like that. So, yeah, there’s a little bit of a mismatch. Although people are trying to Kubernetes-ify everything, I guess? Cause that’s yeah. So we’ll talk about that later as well. But yeah, you mentioned about this giant operating system, right? It’s like cloud is an operating system where you have so many infrastructure, these hardwares that you can use, and you have to build some kind of layers to control all of them.

[00:17:27] Cloud Engineering

Henry Suryawirawan: Which brings back to the next topic, which is the cloud engineering. So with all these needs, all these resources available, you mentioned that there are probably more things that we can do in order to help people to build on top of the cloud. And you mentioned this term called cloud engineering. Why is cloud is another kind of like discipline within engineering? What is cloud engineering? Maybe you can tell us more about that.

Joe Duffy: Yeah. I think we’re starting to see this term used a lot more broadly. But to us, cloud engineering is a few things. One, bringing the cloud closer to application developers, so that cloud is part of the software engineering process. Two, bringing software engineering tools and techniques to infrastructure teams to help tame some of the complexity of the modern cloud and, really, all this ties into what we were just talking about, which is how do you program the supercomputer? Will you do it with software engineering techniques? We don’t think of it as an afterthought or something that is lots of copy and paste. You know, we think that really bringing to bear all the decades of improvements in software engineering applies just as much to the cloud as it does to regular application development on classic operating systems.

And then by doing the first two things, we do the third thing, which is really helps break down the barrier between developers and infrastructure teams, so we can just collaborate on equal footing to just build great software together. And to us, that’s cloud engineering. You can think of it as the evolution of what comes after DevOps. In many ways, DevOps was more about bringing some dev techniques to ops teams, but not as much the opposite, which is really having developers thinking about the cloud as part of the application architecture. And I think you need to do both in order for that cross-pollination to happen. At least that’s what we’re seeing.

Henry Suryawirawan: So it’s very interesting that you mentioned this is like the next phase of DevOps, or maybe it’s just another flavor of DevOps, right? DevOps itself is like people say it’s mindset, culture where developers and operations people work together very closely. It could be borrowing each other’s techniques. But I think cloud engineering, maybe it’s like a special flavor of the DevOps or the next stage of DevOps where you use the power of the cloud to bring the devs and the ops closer together.

[00:19:34] Next Stage of DevOps

Henry Suryawirawan: And you mentioned that aspects like infrastructure, application development, and also compliance here as well. How you can actually bring them together with the same engineering practices and tools. Tell us more about this aspect of practices and tools. What do they need to borrow from each other, maybe?

Joe Duffy: Yeah, it’s a good question. It’s very important that we don’t lose some of the benefits of the prior model in this transition. I think I went through an interesting transition at Microsoft, which was, you know, we used to have software engineers and software test engineers, and they were completely separate organizations. What you had was developers write code. They throw it over the wall. The test organization picks it up, test it, finds all the bugs, ship the bug reports back. And it was a very inefficient way of dealing with it. And we moved away from that over time, which, you know, most of the world has moved away from that model for good reason. Cause you want everybody to think about quality as a first class thing.

We merged the teams, but we made several mistakes when we did that, which was we didn’t recognize the importance of preserving those unique skillsets for people that really love testing. They think very creatively about how to find bugs and put in place proactive automation. I look at this sort of transition. We need to make sure the same thing doesn’t happen, which is infrastructure is a specialty. Not every developer is going to want to learn how to properly configure a multi-region Kubernetes cluster that has high availability and is cost efficient and all the secured network concepts there as well. So we need to preserve all of those domain experts. But we need to combine that with the ability to use software to tame complexity at scale. And so I think it’s sort of marrying the two, but not losing what makes each of them important.

Henry Suryawirawan: Yeah. So you brought up a very good point here because people are thinking, how should they form their teams? Some people call it team topologies topic this day. You mentioned in Microsoft, you had the experience of merging software developer and software test engineer, right? Do you think also these days people should form a team where, so called the infra related people, you know like, the administrators or maybe the operations, or maybe SRE, some people call it, should they be merged with the same team and becomes like, an independent product team?

Joe Duffy: I think the answer depends a lot on the size of the team or the business. I think it also depends on the DNA of the current team. Some of these changes, you can’t just wave a wand and make it overnight, even if that’s where you eventually want to get to at the end of the day. What we often see is specialization is a good thing, and at scale you need to specialize. Like if you have one SRE, well, maybe it’s okay for that SRE to be embedded within the engineering team, and maybe you wanna spread a lot more of the responsibility. If you have 50 SREs like Google, I’m sure Google has 500 or 5,000 or something like that, but you’re probably going to have a dedicated team because it makes sense to centralize some of that expertise. And I think you’ll find models everywhere in between.

A very common model we see within our customers, and this is mid-scale, large scale, is the platform team. So the platform engineering team is typically a team that’s in between the developers and the sysadmins operators, and folks that are spending all their time on infrastructure. The platform team typically does take a software engineering mindset, but it’s a hybrid of engineering expertise and infrastructure expertise. And their goal is to build the automation for the organization around them to make everybody productive, so they can ship code faster, but also do it securely within compliance, within budget, and that sort of thing. So this has honestly been an emerging trend over the last five years that’s just really accelerated.

[00:23:02] Build Phase

Henry Suryawirawan: So let’s move back to the so called the life cycle of people building products. You mentioned in cloud engineering you have this kind of workflows, right? Very simple and common workflow, which is build, deploy, and manage. So with this cloud engineering discipline, how maybe can we do it differently in of this lifecycle? Maybe let’s start from build, which is building the software itself, writing the code, maybe building the packages, and things like that. How does cloud engineering do things differently?

Joe Duffy: Yeah, I think there’re some micro steps in between too. So, you know, designing, building, securing, testing, deploying, monitoring, managing, but those are the three key ones in there. I think when we approached the space, you know, I’ll be honest, when we first started, we just wanted to solve problems. We didn’t say, ah, let’s go create an infrastructure as code tool. We sat down to solve the problem, and the problem to us was, honestly, building modern cloud applications was not so great at the time. I mean, you wanted to stand up just a simple containerized service that was five boxes on the whiteboard. Next thing you know, you’re like knee deep in 4,000 lines of YAML, and that didn’t feel great. Especially coming from a developer experience background.

And so what we did that’s different from most tools is we let you bring your favorite language to express your infrastructure as code. By doing that, we tap into the ecosystems around these languages. IDEs, test frameworks, package managers, the ability to share and reuse common patterns, and really give a great developer experience. You know, went from a world of copying and pasting YAML all over the place where if you had a typo, you didn’t find out until 30 minutes into your deployment to suddenly you’re in your IDE typing, you’re getting red squiggles and the compilers helping you. It’s just that easily 10 times better experience than where most folks are coming from. In fact, since we’ve launched Pulumi, we’ve had a number of kind of projects, I think inspired by Pulumi, come out from AWS, from HashiCorp that apply similar techniques. That was the first sort of aha moment we had that told us we were onto something.

[00:24:58] Programming Language for Infrastructure

Henry Suryawirawan: So this might bring us to the debate. You know, like some people like YAML, some people don’t. Why do you think it’s important for us to be able to do this infrastructure related thing, not just in YAML or maybe other types of configs, like JSON, into something that is more general purpose programming language, things like JavaScript, NodeJS, or something like that.

Joe Duffy: Obviously, we have opinions, but we want the right tool for the right job. And sometimes YAML or JSON is a perfectly fine tool for the job at hand. So we actually support YAML now as a language. You know, Pulumi is a multi-language platform. I think the insight we had was how do we get all of the rigor of infrastructure as code? Being able to preview before a deployment, getting full audit history of who changed what and when. All of these elements that we love about infrastructure as code, but take those, but then also give you your choice of language.

And so, for the simple cases where literally 10 lines of YAML just do the job, or maybe your machine generating the code in which case machine generating YAML is much easier. But we looked out in the world and like, people are putting Go templates inside their YAML. They’re writing Python transpilers so they can generate their YAML. We have domain specific languages, you know, for like HCL, for example, that’s adding sort of quasi for-loops and stuff in that, but is not a real for loop. We’re not trying to bend and twist templating languages and something that we’re never designed to do, but for the things they’re designed to do, we absolutely want to support you.

[00:26:22] Importance of Tools Ecosystem

Henry Suryawirawan: Yeah, it’s interesting. So for people also, when they look at all this infrastructure as code tool in the beginning, when they see a few lines of YAMLs and some configs, and boom, you can just run some infrastructure. That looks really cool. I mean, back then, wow! I didn’t know that you can actually do that. People start using that, and obviously, over the time, there are some constraints that you hit, right? So things like, for example, this conditional or maybe for loops, and that’s why people start to customize their DSL to become further DSL. So I find it a struggle also during that time. Especially, you mentioned about the broader ecosystem. You mentioned things about IDE. I think last time in the beginning, it’s just text editor, probably. That’s the best that we could do. You don’t have like auto complete. You don’t have type safety and things like that. And also things like test, testing framework. I think it was clunky back then when I used all this infrastructure as code. So tell us more why all this ecosystem is actually very important? It should be a first class concern for doing infrastructure as code or maybe cloud engineering as well.

Joe Duffy: To your point, it starts simple, but it gets complex very quickly, and I think we invented programming languages to help with complexity as we scale. And I think scalability is something we talk about a lot. Our CTO, Luke, actually helps start the TypeScript project at Microsoft. And funny thing, I went back, and I was looking at their first homepage back when they launched, and it was JavaScript that scales. And I think Pulumi really is infrastructure as code that scales. And the reason it scales is because of the languages. And languages give you the ability to encapsulate complexity where it’s not needed. Abstract away concepts into higher-level concepts, so we can build bigger things out of smaller things. We would’ve never gotten to where we did on the application development side if we didn’t have these facilities. And now the ability to apply this on the infrastructure side is very powerful.

To your point, the simple things get complicated. You just look at Amazon has a Virtual Private Cloud. Pretty much every customer that’s going to go to production in Amazon is presumably going to want to spin up a VPC. They have a standard blueprint that captures best practices. Well, that standard blueprint is 4,000 lines of YAML and 6,000 lines of JSON, simply because you have all the curly braces on top of the YAML, and like nobody can understand that. The reason why it’s so complicated is it doesn’t have for loops, and so for every availability zone you have to copy and paste and then rename the thing from one to two, two to three. That same thing poured into Pulumi in Python is 200 lines of Python. Not only that, but you can stick it in a package, and now give somebody a one-line way to spin up a properly configured VPC. And given that everybody in the world has to do that, just imagine worldwide how many human hours that’s going to save just being able to do that. So that’s just one example. But that’s very, very common in infrastructure scope.

Henry Suryawirawan: Yeah. Talking about reusability, right? So in software, the concept is like, we want to be able to reuse as much as possible. And then yeah, when you start having all these so-called codified best practices, like you mentioned, you know, best practices of AWS to create VPC. I guess when you have these best practices, then you start to see more complexity, maybe more parameters as well being introduced. And that’s why all this maybe kind of like breaks the analogy. And lastly, it’s about the module itself. So I think with the programming languages this day, you have a good package manager. So I think like NPM, maybe PIP and Python and things like that. So yeah, I think that really resonates as well with many people, I believe, with all the struggles of coming up with the best practices of building infrastructure.

[00:29:48] Deploy Phase

Henry Suryawirawan: Let’s move on to the next workflow, which is deploy. Tell us how cloud engineering does thing differently?

Joe Duffy: I think really moving to a code centric model for how you do deployments is definitely key to cloud engineering and something we see a lot of our customers wanting to do, but struggling to do. What I mean by that is most people are triggering deployments of their applications through Git workflows today. Some are still doing them manually, but regardless, all the code is in source control. I can’t tell you the last time I met an engineering team who wasn’t using source control, thankfully. And so moving to that model for all your infrastructure changes as well, that’s pretty much table stakes. Most people are there today, but people are struggling to actually how to trigger the deployments.

The more modern cloud architectures, the line gets a little bit blurry. Let’s take a serverless application. I’ve got an API gateway, 10 functions, and a Dynamo database or something. It’s deployed in AWS. Well, if I want to add a route to my application gateway or my API gateway, is that an infrastructure change or an application change? Usually, if I have an express JS app, I just go change routes.js, and stick in a new route, and I’ve got a new route. Well, with API gateway, I have to go change the infrastructure. What about the functions? If my function starts running out of memory, for some reason, cause my application workload increased, do I have to go make an infrastructure change? Or is that application? So like the line becomes blurry for a lot of the infrastructure and many of our customers want to enable developers to self-serve with guardrails. They want to make sure that developers don’t shoot themselves in the foot from a security standpoint. But like some of these changes you want to enable, and building and publishing containers are another good example.

And so the D in CI/CD is deployment. We think of that as not just application changes, but infrastructure changes as well. That’s a key thing that a lot of people are trying to get to, and a lot of people are struggling to get there. And then once you have that, you want to do verification with testing and security and everything just built into how you’re doing deployment. Same thing for application, same thing for infrastructure. So if you’re detecting a theme, it’s that we don’t need to bifurcate how our entire team operates between apps and infrastructure. We can really share a lot of the same engineering practices.

Henry Suryawirawan: Yeah. So I was also looking back at my previous experience, right? Like building all the CI/CD tools and introducing infrastructure as code, and yeah, it seems like there’s a mismatch. So it’s like a tool lifecycle thing, right? You build, package your software. You deploy maybe using software way of doing things. While on the infrastructure is like another set of tools, and the glue in between most likely is like bash script or some kind of scripting in between, right? Especially if you deploy your application as a container, you also have Kubernetes manifest, for example, and you have another set of tools, most likely it’s like kubectl. So I can see why totally, sometimes this mismatch, this gap, is actually a struggle for teams. Because they have two different skill sets that they need to think about. And if we do it in a programming language way, maybe it’s kind of like similar. It’s just like same build tools, I guess. Same way of deploying, I guess. So I can see totally the advantage of using this cloud engineering technique.

[00:32:47] Advanced Deployment Automation

Henry Suryawirawan: But I also have interest when I look at your website about this deploy lifecycle is that we can now do more advanced deployment automation. So things like maybe canary rollout or A/B testing, and things like that. How does this new way actually help for this advanced automation?

Joe Duffy: Yeah. We definitely see a lot of advanced use cases amongst pretty standard workflows in our customers. For example, one of our customers has a hundred production regions that they deploy into, or a hundred environments, let’s say. And they want to orchestrate that over the course of two weeks. Because they want to roll out to one region, make sure things are okay. Start with a canary. Ratchet that up to a 100%. Monitor all the metrics, and only graduate between environments as things are healthy, and then automate the rollback if things start to go south. Honestly, I will say what we’ve got is great because you can build anything you can dream of. We have something called the automation API, which is effectively, what if infrastructure as code was just a library that you could program against instead of being a CLI that a human had to run or that you stuck into CI/CD. Our customers build amazing things with those self-serve portals, these advanced deployment models, custom CLIs. There’s a lot you can do with it.

I’ll say the current CI/CD systems. We took an approach of integrating with them. So Spinnaker, GitHub Actions, GitLab pipelines, Jenkins, like all the standard ones we integrate with. But CI/CD systems today are not great for infrastructure. They work, but like, as you say, there’s something missing there, the major missing pieces. Most CI/CD systems think of the world as in like 10, 30 minute or worst case, one hour byte size chunks. Many infrastructure activities can span, like I said, days, weeks, sometimes they have manual approval steps because somebody needs to sign off on budget or provision, something via a ServiceNow ticket. So it is more complicated. So having that ability to just customize and extend the system with the automation API is a pretty killer scenario for a lot of our customers.

Henry Suryawirawan: And some people also, these days, adopt this thing called GitOps, right? Is that something a different way that you have to support as well? Tell us more about this GitOps. Is it the best practice also in your opinion for cloud engineering?

Joe Duffy: Yeah, we see a lot of folks doing GitOps. Mainly because they want to be able to do a deployment based on Git branch merges and pushes, and it fits in nicely with a lot of the application life cycle. You can basically just use pull requests as approval and review, just like you would a code review for application code changes. So it does go nicely hand in hand with infrastructure as code.

I will say some people use the term GitOps to really mean Kubernetes centric deployments. And to me, that’s a completely orthogonal dimension that I have many, many thoughts on. But to me, that’s separate from the notion that, “Hey, we’re just going to do all of our deployments triggered off Git events”. To me, that can be done regardless of whether Kubernetes is in the picture or not.

[00:35:43] Manage Phase

Henry Suryawirawan: Right. Let’s move on to the next lifecycle, which is Manage. So after you built your application and infrastructure code, for example, you deploy it, and the application starts running. The infra has been provisioned. So tell us the importance of this manage life cycle.

Joe Duffy: Yeah. So a lot of folks who we talk to, it’s sort of like finding a needle in the haystack when they want to find what’s running in production. Who did a deployment? Why did something change? Is it secure? How do we start to tame that chaos?

And so with Pulumi, we really wanted it end-to-end. It starts from build. You deploy it, but you’re not done there. You want things like drift detection. You want things like the ability to go and search over your resources to find something when something goes wrong. We also have the same way GitHub tells you all the code changes that are happening and who made the changes and what the diffs were. You want to be able to do that for your infrastructure as well. Both for just staying on top of the rhythm of the team, but also in the event that something goes wrong.

And so for us, it doesn’t stop with, okay, the infrastructure’s up and running. Once you get it up and running, now you need to version it and manage it and evolve it. And so that end-to-end lifecycle is really important as well. That’s part of cloud engineering, and honestly, that’s where a lot of the infrastructure experts in a lot of the operations and sysadmins play a more significant role.

[00:37:00] Compliance as Code

Henry Suryawirawan: Yeah. Importantly also these days, people look more towards the governance part of it. So some people call it policy as code, compliance as code, everything now is in code. So yeah. Why is it important to have this governance and, you know, policies in code as well?

Joe Duffy: There’s a phrase we like to use, which is secure by construction. You want to find issues before they go into production, right? And so policy as code helps you enforce policies before you’ve actually done the deployment. Actually, it’s part of the deployment. And honestly, for us, it begins even before that. The static type checking, the fact that can run linters, you know, if you want to use Pylint, you can use Pylint or ESLint or you can encode your own policies into that. That’s even before the policy as code, so you really get these layers of defense to catch issues before they get out into production. We have customers that use policy as code to enforce cost and budget concerns as well to make sure, hey, if you’re going to increase spend by more than 10% in a single deployment, you need to get approval for that. Things of that nature. And so obviously, speaking to the manage part, it’s end-to-end. You still want to find issues after the fact that have already gotten out into production. But, ideally, you want to catch them before it’s too late.

Henry Suryawirawan: So you mentioned after the fact, it’s like the drifts, right? Because sometimes, yeah, we deploy through the CI/CD pipeline, but you know, some people who have access to the cloud, maybe they can just go in and change, or maybe you have access to the CLI, you also make some changes out of band. I think all these drifts tend to happen when you have multiple ways of changing the infrastructure. So I guess these kinds of things really help.

[00:38:32] Infrastructure as Software

Henry Suryawirawan: Speaking about infrastructure as code, you actually coined a new term called infrastructure as software. What is it about? What is the difference between code and software in this case?

Joe Duffy: We say infrastructure as code, but it’s really been infrastructure as text until just recently. Actually, early in the use of the phrase infrastructure as code, it actually was more code than it is generally these days. Like we used Chef and Chef had Ruby and Ruby gave you Cookbooks and you actually could run tests, and there’re a lot of benefits you had. At some point, we decided that infrastructure as code meant JSON, YAML, or domain specific languages that were very limiting. And I think the idea that we needed to find something that differentiates, what is different? Like it’s not just the code. It’s about the whole software engineering practice you have around that code.

And so infrastructure as software to us is really just applying software engineering practices and treating infrastructure like it’s software. You think about modules. You think about sharing and reuse. You think about how we architect these things. It’s not about whipping out a piece of code to like script our way out of it. It’s really about setting back and thinking about this as building software. And I think that’s the key difference.

Henry Suryawirawan: You actually shared a very interesting evolution there, right? We all started with like Chef, Puppet, where you write real code. For Chef, it’s a Ruby code, it’s a Cookbook. And yeah, over the time we kind of like moved. In my opinion, maybe it was also because the gap between the administrators. They have to learn a new programming language, which maybe they were not taught so back then, and people have like struggles to actually learn programming languages. That’s why people invented all these DSLs or maybe use YAML and JSON. And then over the time, we kind of like find struggle as well with that approach. And now bringing it back to the general purpose programming language. So I think that’s a really interesting history of how all these tools coming together.

[00:40:22] Kubernetes-Based Infra as Code

Henry Suryawirawan: There are now multiple flavors of infrastructure as code, as you mentioned, like more text configuration based, the JSON, the YAML, the DSLs. And also the code based. Things like Pulumi, CDK, Cloud Development Kit. Now HashiCorp also has it. There’s another flavor which I also found, which is the Kubernetes based. When I said Kubernetes-ify everything in the beginning, I’m referring to these tools, where you manage or maybe provision infrastructure through Kubernetes. Things like Crossplane, or maybe Kubernetes Config Connector from GCP. What do you think about this?

Joe Duffy: Yeah. And by the way, on the Ruby thing, sometimes people say, “Oh, well, with full-blown languages, you can create a messy code base”. And let me tell you some of the messy code bases I’ve seen for CloudFormation and Terraform. And by the way, that same argument applies to application development. We don’t go out there and tell people, “Oh, don’t write Java because you’re going to create a messy code base”. We optimize for value, and that relates to the Crossplane answer.

When I think of Pulumi, it’s really three things. One, it’s a programming model for how you express cloud architectures. That’s one thing. Two, it’s a deployment engine that can do infrastructure as code deployments. And the third is it’s a cloud engineering platform that helps you accomplish everything we talked about earlier. You know, the CI/CD, testing, policy as code, the whole end-to-end. It’s about how do your operational infrastructures operationalize infrastructure as code.

And I w think of Crossplane as a very interesting alternative to that second thing, it’s still all the YAML. So you still have the whole mess of YAML. And it’s not solving for any of the third things, which is all the software engineering, cloud engineering topics that we talked about. So it’s really just a deployment technology that can sit inside Kubernetes clusters and deploy to other clouds. And frankly, for folks that are all in on Kubernetes, that’s it. You know, from now until the end of time, they’re all in on Kubernetes. You know, I think Crossplane is a fine technology. What we see out in the world is almost all of our customers, if they’re using Kubernetes, it’s really what I mentioned earlier, it’s not the center of the universe. It’s really the scheduler for the cloud. As you pointed out, they’re evolving in a bunch of different directions, like security model, things like that. But honestly, I’m going to run in AWS. I’m probably going to want to optimize for the easiest way to adopt AWS. And just being entirely frank, seldom does that mean go all in on Kubernetes. But for some folks, it is going to be the right answer.

Henry Suryawirawan: I find also, maybe it’s also a reaction to those system administrators or operations people who have Kubernetes background. All they know is like kubectl, and you just manage through manifest, which is kind of like standardized in a way, right? You have the kind and then you have parameters. It’s all standardized. Same kind of flavors. You don’t have like DSL, so to speak. Although it turns out some infra resources are complex, and then you start having more annotations, more types, and things like that. So it becomes quite complex as well.

[00:43:11] Future of Cloud Engineering

Henry Suryawirawan: These are all exciting times, actually, for infrastructure and cloud. All these tools also coming in, bringing innovations. People now have a lot of options. What is the future in your view with all this cloud engineering?

Joe Duffy: Yeah, I think we’re still really just getting started. For us, we are really excited about leading the innovation agenda for infrastructure as code and really showing the world that there’s a new way of doing this. And you see, as you mentioned, CDK, and now with HashiCorp, the Terraform CDK, like I think we’ve inspired kind of this new way of doing infrastructure as code. That’s really exciting, but we still have a lot of work to do to make sure that Pulumi is the default option for everybody in the industry. But that’s really just getting started. We just laid the foundation. We gave this nice programmable surface area on top of the cloud. You think of everything we talked about cloud operating system. What is that missing application model for the cloud? How do we get to the world where the three or five boxes on the whiteboard are easy to express rather than decomposing into hundreds of building block services? There’s a missing layer in between. And so I think, for us, we’re really focused on what is that next layer of abstraction on top of this foundation. And that will be much more for application developers who aren’t necessarily focused on infrastructure as code. It really should be 10 times easier to build a modern cloud application, and so we’re really excited about enabling that.

And then, honestly, adopting cloud engineering is not push a button and we have cloud engineering, and it really, really should be. There’s so many teams out there having to build their own custom platforms. And again, having the ability to extend the system, like with our automation API is great. But wouldn’t it be even better if you didn’t have to start from scratch every single time in every single company? So I think that’s the third major area is really helping these organizations adopt cloud engineering at large scale.

[00:44:52] Pulumi Customer Story

Henry Suryawirawan: So maybe if you can share us maybe some of the recent cool things being innovated from your customers using all these cloud engineering practices or Pulumi, specifically. Maybe there are some great innovations that you can share.

Joe Duffy: Yeah. I think one of the best innovations is just like the productivity wins that come out of the box. And so, the Bitbucket team at Atlassian, the team actually building Bitbucket, used Pulumi. They’re able to introduce automation with test case, unit testing in their favorite language. They tell us now they spend half as much time on all of the, just like maintenance tasks related to deploying infrastructure. And that means 50% more time they can spend just on business customer value. That’s a pretty big deal.

I’ll look at another customer, FaunaDB, who’s an earlier stage startup. They said, “Hey, it used to take us multiple weeks just to get a new feature shipped with all the infrastructure changes”. And they’d gotten it down to days and sometimes just hours. Again, they can just move a lot faster. I look at Mercedes-Benz. Similarly, they were able to just turn around and enable their developers to spin up microservice environments on their own. They used to have to file a ticket. This is all Kubernetes based, but now they can just use a Pulumi program, spin up their infrastructure. And because the central platform team gave them reusable components, they can say, give me a Kubernetes cluster on Azure. Give me some data services, all these things. And a developer can just get an environment and now they’re ready to go and they can do that in minutes. Whereas previously, they have to file tickets and wait for weeks to get those stood up. Those are just some examples, but commonality is scalability, but also just productivity and speed.

[00:46:30] 3 Tech Lead Wisdom

Henry Suryawirawan: Very interesting use cases. So thanks for sharing that. So, Joe, it’s been a pleasant conversation. I learned a lot from the way how Pulumi works and cloud engineering specifically. As we move towards the end of our conversation, I have one last question that normally I ask for all my guests in my show, which is to share this thing I call three technical leadership wisdom. So think of it like advice that you want to give to the listeners here, maybe based on your experience, your expertise, or maybe some of hard lessons, maybe.

Joe Duffy: Yeah, I think it’s a great question. I love the question. I would say the first one is, when in doubt, just solve important problems to your customers, and you really can’t go wrong. I think it’s easy to overthink things and build grandiose technologies, but ultimately, your customers will keep you honest. If you just solve their problems, that’s the easiest path to success. I think with Pulumi, we actually had our first paying customer before we even open sourced the thing, because we wanted to make sure that we’re actually solving a problem that is important enough that somebody will pay for it. And I think throughout my career, I’ve often gotten interested in shiny objects and built things just for sheer pleasure, and that’s fine sometimes too. But if you’re starting a company, I think just solving customer problems is really the name of the game. Not just on day one, but also as you scale and grow as well. Like just listen to your customers and they’ll tell you kind of what they need. So that’s one.

I think the second is good enough is never good enough. I think always relentlessly pursuing a higher bar, really compound interest, you know, 1% better every single day compounds into something huge. In fact, you know, 0.1% better every day compounds into something huge. And so, I think it’s easy to accept status quo. I think with Pulumi, when we started, we wanted to dream big, and think of this amazing future that could exist. We’re not going to get there by slowly iterating and just accepting the status quo.

That leads to the third one, which is really dream big, but be realistic with what you can attain in a finite period of time. Some might say shoot for the stars and land on the moon. It’s okay to make progress provided you’ve still got that big dream that you’re pursuing. If you’re iterating and getting 1% better every day, then you’ll probably get there, eventually.

Henry Suryawirawan: Wow. I find it really beautiful. Thanks for sharing that. I think it speaks a lot to those people who aspire big, dream big. Maybe they want to create companies as well.

So Joe, thanks for being on the show. For people who would love to connect with you, or maybe continue their conversation, asking about cloud engineering, Pulumi, and things like that. Is there a place where they can find you online?

Joe Duffy: Absolutely. So I have a Twitter, funcOfJoe. I also have a blog, JoeDuffyBlog.com, which I remiss in updating, but I will get back to it soon one day. But always just feel free to email me. I’m joe@pulumi.com. We’ve got a great community. I’m in our community Slack, so I’m sure you can find me if you want to.

Henry Suryawirawan: Thanks again for being on the show. It’s been a pleasant conversation. Good luck with all your Pulumi and cloud engineering things that you are doing. Thanks for that.

Joe Duffy: Likewise. Thanks for having me. Thanks, Henry.

– End –