#18 - Succeeding in Tech & Cloud Latest - Kelsey Hightower

07-Dec-2020 1 hour 16 mins Kelsey Hightower

included in Culture & Practices Personal Growth DEI DevOps Cloud Architecture & Design Architecture Microservices Security

Buy me a coffee

“What I come to realize is that technology doesn’t move that fast. The fundamentals are roughly the same. It’s the fact that we don’t necessarily teach fundamentals. When you start to focus on the fundamentals, then you don’t mentally get attached to one particular implementation.”

Kelsey Hightower is one of the leading figures in open source, cloud computing, and Kubernetes. I’m extremely excited to have him with me sharing a lot of his insights around many things in tech. We started the conversation with what he has been doing recently—his involvement in serverless technologies and security landscape. Kelsey then shared his interesting career journey of how he got from working at fast food in high school to where he is at Google today. He also shared his advice on how one should learn and develop knowledge in the current fast changing technology landscape, and how he shifted his learning mindset to overcome impostor syndrome. Kelsey also discussed various latest updates on cloud, serverless technologies, and Kubernetes. He also shared how he has developed his fundamental understanding of certain technologies by learning them “the hard way” and publicly. We also covered his latest observation and views on microservices vs monolith. Last but not least, we close off the session with Kelsey’s Tech Lead Wisdom on his take around personal growth, learning, and his preferred way of leading by inspiring others.

Listen out for:

What Kelsey is up to - [00:06:39]
Kelsey’s career journey - [00:10:15]
Succeeding in tech from under-represented groups - [00:13:21]
Understanding technology fundamentals - [00:16:45]
Impostor syndrome - [00:21:19]
On cloud latest and cloud native - [00:27:51]
Twelve-Factor application - [00:34:00]
Serverless latest - [00:36:14]
Monolith vs microservices - [00:42:44]
Learning things The Hard Way - [00:54:20]
Kubernetes-ify everything - [01:02:15]
Kubernetes resources - [01:08:54]
Kelsey’s 3 Tech Lead Wisdom - [01:12:13]

_____

Kelsey Hightower’s Bio
Kelsey Hightower has worn every hat possible throughout his career in tech, but most enjoys leadership roles focused on making things happen and shipping software. Kelsey is a strong open source advocate focused on building simple tools that make people smile. When he is not slinging Go code, you can catch him giving technical workshops covering everything from programming and system administration to his favorite Linux distro of the month.

Follow Kelsey:

Twitter – https://twitter.com/kelseyhightower
Github - https://github.com/kelseyhightower

Mentions & Links:

Protocol article covering Kelsey’s story – https://www.protocol.com/kelsey-hightower-google-cloud
Kubernetes the Hard Way – https://github.com/kelseyhightower/kubernetes-the-hard-way
📚 Kubernetes Up and Running – https://amzn.to/3tcaIam
Serverless – https://en.wikipedia.org/wiki/Serverless_computing
IAM – https://www.csoonline.com/article/2120384/what-is-iam-identity-and-access-management-explained.html
RBAC – https://en.wikipedia.org/wiki/Role-based_access_control
SPIRE – https://spiffe.io/docs/latest/spire/understand/concepts/
SPIFFE – https://spiffe.io/
TLS certificates – https://protonmail.com/blog/tls-ssl-certificate/
JWT – https://jwt.io/
Istio – https://istio.io/
Service mesh – https://www.redhat.com/en/topics/microservices/what-is-a-service-mesh
Envoy – https://www.envoyproxy.io/
Containers – https://www.docker.com/resources/what-container
Dockerfiles – https://docs.docker.com/engine/reference/builder/
Kubernetes – https://kubernetes.io/
Kubernetes service accounts – https://kubernetes.io/docs/reference/access-authn-authz/service-accounts-admin/
Kubelet – https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet
CNI – https://kubernetes.io/docs/concepts/extend-kubernetes/compute-storage-net/network-plugins/#cni
Pod disruption budgets – https://kubernetes.io/docs/tasks/run-application/configure-pdb/
CI/CD – https://en.wikipedia.org/wiki/CI/CD
Argo – https://argoproj.github.io/argo-cd/
Tekton – https://cloud.google.com/tekton
Prometheus – https://prometheus.io/docs/prometheus/latest/configuration/configuration/
Zipkin – https://zipkin.io/
Datadog – https://www.datadoghq.com/
ReplicationController – https://kubernetes.io/docs/concepts/workloads/controllers/replicationcontroller/
API machinery – https://github.com/kubernetes/apimachinery
Configuration management tools – https://en.wikipedia.org/wiki/Configuration_management
Jenkins – https://www.jenkins.io/
Spinnaker – https://spinnaker.io/
Puppet – https://puppet.com/
Chef – https://www.chef.io/
Ansible – https://www.ansible.com/
Custom Resource Definitions (CRDs) – https://kubernetes.io/docs/concepts/extend-kubernetes/api-extension/custom-resources/#customresourcedefinitions
Cloud Foundry – https://www.cloudfoundry.org/
Amazon Web Services – https://aws.amazon.com/
Azure – https://azure.microsoft.com/en-us/
Oracle – https://www.oracle.com/index.html
AWS Lambda – https://aws.amazon.com/lambda/
Google Cloud – https://cloud.google.com/
Google Cloud service accounts – https://cloud.google.com/iam/docs/understanding-service-accounts
Cloud Run – https://cloud.google.com/run
Cloud Spanner – https://cloud.google.com/spanner
Cloud SQL – https://cloud.google.com/sql
Cloud Run – https://cloud.google.com/run
Cloud Build – https://cloud.google.com/cloud-build
App Engine – https://cloud.google.com/appengine
Memorystore – https://cloud.google.com/memorystore
Traffic Director – https://cloud.google.com/traffic-director
Stackdriver Logging – https://cloud.google.com/logging/docs/
TensorFlow – https://www.tensorflow.org/
Infrastructure as a Service (IaaS) – https://en.wikipedia.org/wiki/Infrastructure_as_a_service
Platform as a Service (PaaS) – https://en.wikipedia.org/wiki/Platform_as_a_service
Function-as-a-service – https://en.wikipedia.org/wiki/Function_as_a_service
Heroku – https://www.heroku.com/
Lift and Shift – https://whatis.techtarget.com/definition/lift-and-shift
Site Reliability Engineering (SRE) – https://en.wikipedia.org/wiki/Site_reliability_engineering
Cloud Native Computing Foundation – https://www.cncf.io/
Linux Foundation – https://www.linuxfoundation.org/
Impostor syndrome – https://en.wikipedia.org/wiki/Impostor_syndrome
T-shaped – https://en.wikipedia.org/wiki/T-shaped_skills
System administrator – https://en.wikipedia.org/wiki/System_administrator
Production Identity Day – https://events.linuxfoundation.org/production-identity-day
KubeCon – https://events.linuxfoundation.org/kubecon-cloudnativecon-europe/
Control plane – https://en.wikipedia.org/wiki/Control_plane
IP addresses – https://en.wikipedia.org/wiki/IP_address
Subnets – https://en.wikipedia.org/wiki/Subnetwork
L3 versus L7 – https://securityboulevard.com/2018/10/know-your-firewall-layer-3-vs-layer-7
Go programming language – https://golang.org/
Goroutines – https://golangbot.com/goroutines/
Observability – https://en.wikipedia.org/wiki/Observability
Structured logs – https://stackify.com/what-is-structured-logging-and-why-developers-need-it/
OpenStack – https://www.openstack.org/
Twelve-Factor application – https://en.wikipedia.org/wiki/Twelve-Factor_App_methodology
Kafka – https://kafka.apache.org/
etcd – https://etcd.io/
Event-driven architectures – https://en.wikipedia.org/wiki/Event-driven_architecture
PostgresSQL – https://www.postgresql.org/
DB2 – https://www.ibm.com/products/db2-database
MySQL – https://www.mysql.com/
Ruby on Rails – https://rubyonrails.org/
Spring Boot – https://spring.io/projects/spring-boot
Virtual machine – https://en.wikipedia.org/wiki/Virtual_machine
NFS – https://en.wikipedia.org/wiki/Network_File_System
gRPC – https://grpc.io/
WebSocket – https://en.wikipedia.org/wiki/WebSocket
Microservices – https://en.wikipedia.org/wiki/Microservices
Monolith – https://en.wikipedia.org/wiki/Monolithic_application
Service Oriented Architecture (SOA) – https://en.wikipedia.org/wiki/Service-oriented_architecture
Mixer – https://istio-releases.github.io/v0.1/docs/concepts/policy-and-control/mixer.html
Pilot – https://istio.io/latest/docs/reference/commands/pilot-agent/
Galley – https://istio.io/v1.1/docs/reference/commands/galley/
GitHub – https://github.com/
JBoss – https://www.jboss.org/
DNS – https://www.cloudflare.com/learning/dns/what-is-dns/
CoreOS – https://coreos.com/
VMware Fusion – https://www.vmware.com/asean/products/fusion.html
Sidecar – https://istio.io/latest/docs/setup/additional-setup/sidecar-injection/
XDS protocol – https://www.envoyproxy.io/docs/envoy/latest/api-docs/xds_protocol
Open Policy Agent – https://www.openpolicyagent.org/

Like this episode?

Follow @techleadjournal on LinkedIn, Twitter, Instagram.

Buy me a coffee or become a patron.

Buy me a coffee

Quotes

Succeeding in Tech from Under-represented Groups

The thing about tech, it’s one industry where typically skills are enough to get you to succeed. Because in many situations, a lot of this technology is so new, that if people only try to hire people with PhDs, it will be hard for them to be relevant, because there is no great PhD program for things like Kubernetes or some of the software development practices we use today in industry. So that’s good news that you don’t necessarily require credentials to be successful in tech.
When you think about globally, every country has local social issues that make it harder for some groups to participate than others. Maybe it’s not appropriate to assign all of those problems to the tech field, but they’re exaggerated inside of tech, because of the social things that are around them.
You have to take on more responsibility of finding those opportunities, holding onto those opportunities, and then being able to execute.
It’s always amazing to me that in technology, lots of companies just want to hire people who know everything on day one. Even for a beginner role. And we don’t do a good job in our field of teaching people continuously, whether they’re new or they’ve been around for a while. It’s always this friction of: we expect you to know everything. And we don’t really have any formalized training to continue to grow skills.
To overcome those things by having access to the knowledge. That’s one thing I think tech has done well, which is making a lot of knowledge and tools accessible, whether that’s free and open source, the knowledge that we see in blog posts and in books.

Understanding Technology Fundamentals

What I come to realize is that technology doesn’t move that fast. The fundamentals are roughly the same.
It’s the fact that we don’t necessarily teach fundamentals.
When you start to focus on the fundamentals, then you don’t mentally get attached to one particular implementation.

Impostor Syndrome

At some point in my career, I realized that I didn’t need to be the best in every piece of technology I was going to be responsible with. I knew I needed to understand it to the degree that my job required. And if I thought it was going to make sense, I could go a bit deeper than the job required as an investment in myself. That was the foundation of my career that I at some point arrived to. So what does that mean? That means that I can decide what technologies are important to me and my skill set that I want to be able to put on the market or leverage for the things I want to do.
I’ve resigned to saying, “You know what? It’s okay to learn in public.” Since I don’t know everything, I’m comfortable asking questions.
So for me, I decided I’m not going to pretend I know everything. Therefore, I’m not worried about being an impostor. And therefore I don’t have to worry about trying to be the best at everything. So I stay focused in the areas that I am interested in, and take the responsibility just to get slightly better, day over day, year over year.
The point of the elective is to expose kids to more things than they would probably naturally pick on their own. So it’s about exposure.
When people are starting their career or making a career transition, I think it’s healthy to try a little of everything, because there’s no way for you to know.
What you’re hearing from other people is things that may have worked for them. But it may not necessarily work for you.
Don’t think one role is inferior to another. I think that’s another mistake people make.
Once you try a little bit of everything, and something resonates with you, that might be a period of your life.
It’s okay to switch.
If you don’t know what you’re good at, or you don’t know what to pursue, try a little of everything. And also don’t think of one role as inferior to the other. This applies not just to role, but maybe also to technologies. And when you find something that you are resonating with, you can go deep. And the last is that it’s okay to switch. You don’t have to self identify yourself with a particular technology or particular role. At one point in time in your life, if you feel like comfortable switching to another role, I think it’s okay to do so as well.

On Cloud Latest and Cloud Native

When we think about the cloud, I think we’ve got a good lock on infrastructure components, like databases, compute, networking, and some security layers. So when we think about the friction that’s left is we’re asking people to understand all of those things before they can go out and build something. And that’s the opportunity for the cloud. This is why I’m excited about things like serverless, where we try to abstract away as much of that underlying infrastructure as possible to get people closer to: “Here’s my idea. Here’s the code that powers it. Run it for me.”
What you want is: hit where’s the button that says best practice for what I’m doing. That’s what most people want to do. So I think cloud should be something where you can bring your ideas. And when you have an advanced use case, you can always drop down to the lower level infrastructure, and build a platform that you need. But as cloud providers, we should try to make the 80% use case as secure and easy to adopt as possible.
When you say cloud native, I think of patterns. It’s a collection of patterns that were born in the cloud.
If you use all the cloud provider’s products, I don’t think that automatically qualifies you for cloud native.
I think when we say cloud native, we’re basically talking about a world that allows you to effectively leverage those cloud services in terms of resiliency, reliability, observability. And to get all of those things, this is where we start to delineate or distinguish what’s a cloud native set of patterns that go along with that. And in many ways, when I really think about it, a lot of these cloud native ideas or concepts are really at the application tier. So for the first time, we’re now focusing on the relationship between the application and the infrastructure. And that’s where a lot of those patterns are to be found.

Twelve-Factor Application

So Twelve-Factor is a good place to start. But here’s the thing. You can have cloud native applications that are not Twelve-Factor applications.

Serverless Latest

When I hear serverless, I think about the operational model that comes with it, which is how do I reduce as much friction as possible.
I still want to give you the same operational model, meaning I want to be able to scale down to zero. So when you’re not using the application, then you don’t pay for it. If you want to run over multiple regions, you don’t have to learn about all of this multi-region networking, and it gets very complex at those layers. And then I can also promise you a bit of security, because I can keep the servers patched underneath you.
When I say serverless now, I’m thinking about the operational model. Serverless can be applied to more of our data stores and databases. Typically, they’ll have a free tier. They’ll scale to zero. Pay-for-use.
The goals going forward is: imagine taking as much of our managed services as we can, and giving people the option to leverage a cost model that’s pay-per-use.

Monolith vs Microservices

It’s not that microservices are bad. It’s just that when some people who are leveraging this pattern unnecessarily, or too far, it became unmaintainable and might even make performance worse. It’s the tradeoff that you should make when you’re doing it for organizational purposes. When it helps you align your organization around areas of specialty.
If you’re just a solo developer, you’re probably better off just writing the monolith to start. If you’re in a small team, and maybe have some good engineering practices, like writing modules that then get compiled together as a deployable, monoliths may help you go very far.
I would probably start with a monolith whenever I can. But remember, we’re not talking about a monolith with no engineering discipline. I think when people say monolith, they’re really describing in some cases, not all, a lack of engineering discipline and standards. Because a lot of times, if you have a modular code base, meaning all the services are in their own repositories, and they have their own workflow and release schedules and integration tests with the rest of the other services at a package level. We’re not talking about deployables yet. We’re still just talking about modular packages. Just like the standard library, and maybe different libraries you use to create your own services. When those things are modular, and there’s clear ownership and good testing, then you can make a decision. Do you deploy all of those things individually? Or do you import them all into the same main binary, and then wire them up with some routes? And that can actually be as maintainable or scalable from a team concept as having independent deployable.
If you develop this way from the beginning, when the time comes to split one of those components out, you’ll be able to do it very naturally, because the boundaries will be very clear about how you do it.
When I think about microservices, I tend to think of: there is a pattern for deciding when to keep all the logic together and when to split it apart. And you can do both at the same time, at the same company, on the same team.
I think the idea with the microservice would be: let’s try to split things up in a way that we can be very clear about our intention. The user service only does user service stuff. And when you want to do something else, you have to go to a whole different repository, a whole different way of thinking about it. And we can then catch if the things are getting too big.
So I decided early on, when I first joined Google right before, that instead of trying to build tools and make everything easy, how about we make it easy to understand. Not necessarily easy to provision. And we ended up doing both.
Sometimes the easiest way to make something easy is not necessarily make it easy to use, but make it easy to understand.

Kelsey’s 3 Tech Lead Wisdom

Decouple your identity from the technology.
- If you decouple yourself from the low-level technologies, and you step back, and maybe you try to align yourself more with the fundamentals, I think that’s going to help you be easier to work with, because you won’t be so dogmatic about things, but it also helps you open the door to new perspectives versus getting bogged down in your job title.
Take your time.
- Everyone talks about high productivity, and going fast. But what I’ve learned over time was: I slow down and enjoy the path to get there.
- I’m okay if it takes a little bit of time, because it is an investment for the long haul. So to me, that level of patience. So what does that mean? I don’t look for how do I learn everything I need to know in five minutes. Or show me the easy way. I’m okay saying it’s going to take a while. And then slowly build up my skills and continue to make progress.
I found it easier to inspire people than to boss them around. And that’s by bringing my whole self to the job.
- What I found most effective for me was to inspire people into action. Because when that happens, they fill in the blanks in a way that I would have never asked them to. And then they approach it typically with way more passion and energy, and they will bring their whole selves to the project.
- Leading by inspiration or persuasion instead of authority, even when you have authority over folks, can change the outcome in dramatic ways and allow that other person to grow.

Transcript

Patron [00:01:35]

Alyssa Siow: [00:01:35] Hi everyone! My name is Alyssa from Singapore. And I’m currently an undergraduate studying Information Systems at Singapore Management University. I’ve had a great honor to meet Henry through a mentoring program at school. And he graciously shared with me about this podcast that he’s working on. And I’m so glad he did!

I’ve definitely gained a lot of insights by listening to all the experts here. Tech Lead Journal is an eye-opener and a great initiative. There’s never been anything quite like it before. The content and guests brought in by Henry are specially curated to provide a wide diversity in understanding what tech leadership and good tech practices are.

What I appreciate most about this podcast is amount of effort and attention to detail in the podcast notes, which can be accessed on Tech Lead Journal’s website. The transcript was openly shared, and the timestamps are nicely structured. There’s even a breakdown on the conversation highlights for easy digestible information. And not to mention, whatever’s brought up during the podcast, be it technical jargon, terminologies, or references. I especially enjoyed the podcast that featured Neha, Stephanie, Crystal, as well as Richard and Gergely. I find the ones with women in tech especially inspiring, because it motivates me to aspire to be just like them, work hard, and to consider partaking in meaningful projects and initiatives to make the tech space a better place. Their sharing and insights to their personal journey, challenges and how they overcome those obstacles to get to where they are, and advice to 3 Tech Lead Wisdom, definitely value adds the conversation.

Tech Lead Journal prides itself in building and growing their audience base organically. So any form of support would be great! If you like or find benefit in the work being done, please consider subscribing and becoming a patron to support writing the podcast, like I did! I cannot wait to see the podcast grow and flourish! Henry definitely deserves it all for the amount of work he has put in.

So yeah, that’s all from me. And thank you so much for having me.

Episode Introduction [00:03:22]

Henry Suryawirawan: [00:03:23] Hello, my friend. Thanks for tuning in. It’s really great to be back here with another new episode of the Tech Lead Journal with me your host, Henry Suryawirawan. We just heard from Alyssa, one of my early patrons of this podcast. Thank you so much for sharing your story with us, Alyssa. I really, really appreciate your support. And I’m also very happy to hear that you have benefited so much from the episodes and the show notes. If Alyssa’s story is resonating with you, and you would also like to make a contribution to the show by becoming a patron, please check out for more information at techleadjournal.dev/patron. Your support will tremendously help me towards achieving a goal that I’m currently running on the page. If you haven’t joined any of the Tech Lead Journal social media channels, I would like to invite you to join us on LinkedIn, Twitter, or Instagram, and you can find the links to those channels in the show notes. Make sure to also subscribe and follow the show on your favorite podcast app.

In today’s episode, I am so excited to share with you my conversation with Kelsey Hightower. Kelsey needs no introduction, as he is one of the leading figures in open source, cloud computing and Kubernetes, and is someone that I look up to for his thought leadership and contributions to the community. We started the conversation by having Kelsey share his inspiring career journey from where he started at the beginning to where he is now today at Google Cloud. He then shared his invaluable advice on how one should learn and develop knowledge in the current fast changing technology landscape, overcome imposter syndrome, and thus be able to succeed in tech. We then continued our discussion to various latest technology updates, be it in cloud, serverless, and Kubernetes, including his latest observation and views on microservices vs monolith. If you follow Kelsey for quite some time, you might have heard about his guide “Kubernetes The Hard Way”, which you can find in his GitHub repository. Kelsey shared with me the reason why he created such “The Hard Way” guide to learn about a particular technology and at the same time doing it publicly. This is definitely an episode you don’t want to miss! It’s jam-packed with knowledge and wisdom. And I personally learned so much from this conversation with Kelsey.

I hope that you will enjoy this great episode. Please consider helping the show in the smallest possible way, by leaving me a rating and review on Apple Podcasts and other podcast apps that allow you to do so. Those ratings and reviews are one of the best ways to get this podcast to reach more listeners, and hopefully the show gets featured on the podcast platform. I’m also looking forward to hearing any comments and feedback on the social media, or you can also directly send to me at techleadjournal.dev/feedback. So without further ado, let’s get started.

Introduction [00:06:21]

Henry Suryawirawan: [00:06:21] Welcome everyone. So today I have a very special guest. Someone that I admire a long time, especially for his technical leadership and also experience in terms of the cloud and also Kubernetes. He’s none other than Kelsey Hightower. So Kelsey, very pleased to have you in the show today. Welcome to the Tech Lead Journal.

Kelsey Hightower: [00:06:37] Awesome. Thanks for having me.

What Kelsey Up To [00:06:39]

Henry Suryawirawan: [00:06:39] Kelsey, maybe for a start, what are you up to these days?

Kelsey Hightower: [00:06:43] These days, I spend a lot of my time around the serverless things. So I work at Google Cloud, as people may know. And the area that I’m most interested in is around all of these serverless technologies, primarily Cloud Run. Most people are familiar with Kubernetes and running their containers, but there’s still a lot of friction in managing clusters and all the complexity that comes with that. So I’m very interested if we can close the gap on the serverless side. Can we remove the infrastructure? They keep the majority of the flexibility for running people’s applications at scale. So that’s my primary thing that I’m doing internally at Google.

On the outside world, I’m very interested in the security landscape. So I’ve been spending a lot of time with things like SPIRE, which is one of the open source projects behind SPIFFE, giving identity to our applications in a way that’s portable. So for people that are interested in cloud, you know that there’s IAM. You have an identity for Amazon. You have an identity for Google Cloud. You have identity for Azure. And those things are not necessarily super easy to make portable. So this whole concept of SPIFFE is basically giving people an identity. You can think of a domain, like example.org, and then having your app live at calculator or Foo. And then those things can be passed around and TLS certificates or JWT tokens. So this is really interesting to have federated identity across VMs, containers, serverless, and even another cloud providers.

Henry Suryawirawan: [00:08:04] So it’s very interesting. I haven’t heard about that. So is it like a portable service accounts in that sense?

Kelsey Hightower: [00:08:09] In some ways, IAM is a complete package. And when you think about IAM, not only do you get an identity, you also get the…, how it can be used and you also get a way to enforce those things. Let’s take Istio, for example. Most people may be familiar with service mesh. And when you say service mesh, Istio is one of those projects that come to mind. And what Istio does is allows you to have a control plane. With that control plane, you can do things to say, App A can talk to App B, and then you need a bit of enforcement at the lower level. So Envoy is a modern proxy where all your network traffic can come in and out. So where does SPIFFE fit in? So we have to tell the system who is app A. So imagine you’re running a container inside of Kubernetes, and Kubernetes will give your container a service account. We could think about that as like the root of trust. If you’re on a VM, you’ll have a metadata service where you can go and get a token, again, another way of ID-ing yourself. And these are two different things. What tends to happen inside of a service mesh is, you take those trusted identities and something like SPIRE and open source component where it can trust Kubernetes, or it can trust Google Cloud, and you can trade those security accounts for another artifact. These are called like security documents. It could be a TLS certificate, so you can do TLS mutual auth between App A and App B. Or it could be a JWT token that you can pass around in the typical HTTP header. So now that you have this kind of universal identity, it can work inside of Istio. It can work inside another cloud provider or any tool that understand the SPIFFE IDs. And then you can start to do things like policy. Now that I know who you are, maybe you’re using SSL certificate to authenticate to me, I can look in that SSL certificate, and there’s going to be a subject alternative name that has your SPIFFE ID. It’s just a string. It’s nothing super fancy. And that string will tell me what domain you’re in, example.com, and then what your service ID is, Foo. I can take that and then go look up to see, can you even call these end points? So it’s more of a way of formalizing identity in a federated way that’s independent of things like Kubernetes service accounts or Google Cloud service accounts.

Career Journey [00:10:15]

Henry Suryawirawan: [00:10:15] Very interesting indeed. Thanks for sharing that. So Kelsey, before we go deep into the technical stuff, as usual, I’d like to ask my guests to share their career journey, highlighting maybe major highlights or turning points that are interesting for the listeners to learn about. So maybe Kelsey, can you share your career journey?

Kelsey Hightower: [00:10:32] I’ve been very fortunate. Maybe a couple of weeks ago, someone did a nice profile piece, a guy named Tom in the Protocol. So there’s the Protocol website. If you go there, it’s also pinned to my Twitter profile, but it walks you through my career trajectory from working at fast food in high school, McDonald’s, at maybe the age of 15, all the way to where I am now in Google. For those that haven’t read the article, the biggest turning points for me, number one is, like many people listening, I’m self-taught. I remember just buying books on Unix and learning Python a little bit on my own, even though we say self-taught or on our own, actually we were taught by the authors of those books and doing a lot of self study. So no college background. I really just got off the ground just doing certifications. And I tell most people that was my start into Tech. Just getting certifications. There was a few turning points. There was a time where I have my own small computer store in a small city right outside of Atlanta, Georgia. And from there, I’ve met a lot of customers, did a lot of windows support, built PCs inside of the computer store, made service calls. And if we were to fast forward, a lot of my early career is around system administration: change Windows, deploying software, watching monitors and trying to resolve issues. Zoom a little further out, a lot of my experience starts to become more development experience: writing code, automation tools, and some apps that would even go on to run in production. And then, there’s this other turning point around open source and public speaking. Around 2012, I ended up working at Puppet Labs, which is one of the first open source container or configuration management tools for managing servers. This is right around the era of DevOps. This idea that developers and operations folks would work a little bit closer together. And Infrastructure as Code was born. And that was right around the time that I started doing a lot more public speaking at meetups, at larger conferences. That kind of helped me identify my second set of skills, which is teaching people. So instead of just building technology and shipping things, teaching other people how to do so. I think those are the major turning points in my career that kind of lands me where I am today.

Henry Suryawirawan: [00:12:36] Yeah. I read that piece of story as well. It’s really inspirational, I would say. Looking back, reading your article, how do you feel reaching your career so far?

Kelsey Hightower: [00:12:44] It can be good to hear other people tell your story through their eyes. One thing that was nice about that particular post or article was the fact that they took the time to interview people that I’ve crossed paths with over my life, and let them tell that part of the story. I know how I viewed that story. But watching the other people, even though I know them and some people I’ve only met once. But just watching them recount those stories and seeing it put in that timeline with some context around it, it was also like reading and learning about someone that I’ve never met either. It was flattering nonetheless, but it was also nice to see someone recap my life in a way that I can digest as an outsider.

Succeeding in Tech from Underrepresented Groups [00:13:21]

Henry Suryawirawan: [00:13:21] Yeah. Interesting indeed. Kelsey, like one of the major phrase in the article saying that “There’s just no way for a guy like you be able to succeed in the Tech”. That represents a lot of effort and probably a lot of things that got involved into your career to be where you are at this moment. And one of the things that I see, for example, is that you are from the underrepresented groups in technology. So these days there are a lot of people from those kind of groups also trying to break through in the technology landscape, in the technology communities and the technology career. From your point of view, what are some of your… maybe advices or maybe even tips on how to break those kinds of barriers, especially for these underrepresented groups? Because we can learn probably from your journey.

Kelsey Hightower: [00:14:02] Yeah. So the thing about tech, it’s one of the industries, I think on the positive note, where typically skills are enough to get you to succeed, typically, or it should be. Because in many situations, a lot of this technology is so new, that if people only try to hire people with PhDs, it will be hard for them to be relevant, because there is no great PhD program for things like Kubernetes or some of the software development practices we use today in industry. So that’s good news that you don’t necessarily require credentials to be successful in tech. Now, some of the barriers that can be challenging, and every country is different, right? When you think about globally, every country has local social issues that make it harder for some groups to participate than others. Maybe it’s not appropriate to assign all of those problems to the tech field, but they’re exaggerated inside of tech, because of the social things that are around them. So for me, coming from an underrepresented group, I’m considered Black or African-American, and there’s lots of biases in this country that I live in, right? Some people may look at me and say, there’s no way you know what you’re doing. There’s no way you can be technical. Maybe you should be playing sports or something. And so that presents a challenge. Meaning if people look at you that way, it may be harder to get a job. Or even if you get the job, it may be hard to work on complex projects. You need to skill up and level up. So what happens then, you now have to take on a little bit more responsibility of finding those opportunities, holding onto those opportunities, and then being able to execute.

And then outside of that, in general, it’s always amazing to me that in technology, lots of companies just want to hire people who know everything on day one. Even for a beginner role. You want someone that has five years of technology X, five years of technology Y. And we just don’t do a good job in our field of teaching people continuously, whether they’re new or they’ve been around for a while. It’s always this friction of: we expect you to know everything. And we don’t really have any formalized training to continue to grow skills. There’s other industries, like being an electrician. They have a whole progress. You go from like a journeyman to someone that’s new. You pair up with someone who does know what they’re doing or have been doing it longer. And it allows you to continue to improve over time. And there’s an expectation that you can pair and have something more official. So those are some of the challenges in tech that I think a lot of people have to deal with. But I also feel that to be able to overcome those things by having access to the knowledge. So that’s one thing I think tech has done well, which is making a lot of this knowledge and tools accessible, whether that’s free and open source, the knowledge that we see in blog posts and in books. But the other thing we haven’t done well, I think is make the roles, the opportunities to leverage those skills. We haven’t made those as accessible as we have the information.

Understanding Technology Fundamentals [00:16:45]

Henry Suryawirawan: [00:16:45] It’s very interesting that you pointed out about this continuous training. So I didn’t think of it that way initially. But now that you said it, I think it’s true as well. And especially these days, people want to hire people who know a lot of stuffs. Could it be also because of that technology moves so fast? And they just can’t probably spend enough time to invest time in people to upskill themselves. And they just want to have everything, starting from the day they joined, and then start running straight away. Could it be because of the pace of the technology itself doesn’t allow them to actually have the time to invest for training?

Kelsey Hightower: [00:17:16] You know what? I used to think that. But what I come to realize is that technology doesn’t move that fast. The fundamentals are roughly the same. Like when people say Kubernetes is new, all of the fundamentals, I can probably recall 10 years ago. The workflows may be different. Instead of copying a binary onto a Linux server and starting it by hand, there’s a much bigger system around Kubernetes that does that for us. So the concept of scheduling isn’t new technology. That they spec, 40-50+ years, this idea of scheduling. It’s just the how we’re using the fundamentals. So for anyone that has fundamentals, you look at these new systems, and you’ll say, “Okay, they’re more workflow engines.” They’re more about composing the fundamentals into a system with the design purpose. So I think when people look at technology shifts, they’re bigger in the minds of people who don’t understand the fundamentals. And there’s nothing wrong with that. What I’m saying is, it’s the fact that we don’t necessarily teach fundamentals. When you go to a job, you hear people say, “I’m a Linux system administrator. Or I’m a Oracle DBA.” Not really that you’re a database engineer. If you were focused on databases, then you would focus more on how data structures are held in memory, how the SQL execution engine works, how the storage is laid out on disk. Because those fundamentals apply to Postgres. They apply to DB2. They also apply to things like Cloud Spanner where you have a multi-region replicated SQL database.

When you start to focus on the fundamentals, then you don’t mentally get attached to one particular implementation. So that’s what makes things seem impossible. It’s like, “Oh, I have to learn this implementation, that implementation and that implementation.” And then people feel like they need to start from scratch. The way I look at new technologies is just, “Okay, what fundamentals is this thing building on top of?” And once I can see those, “Okay, now I know that I know 80% of what’s going on here. Now let me go fill in the 20%, which is their configuration, their workflows.”

Henry Suryawirawan: [00:19:09] In your opinion, what are some of the building blocks of the fundamentals that people in the tech should know about?

Kelsey Hightower: [00:19:16] Let’s talk about networking. So you’ll see people in tech that say, “Hey, we’re going to be doing service mesh.” And I say, “Why?” or “What is the service mesh doing for you?” And then you might see a long pause, because it’s just “Istio is going to make it easy for me to secure my microservices.” And that’s straight off the website. Now the fundamentals in Istio is, at the very bottom, they’re still networking. So IP addresses, subnets, routes, L3 versus L7. L3 is going to be, in some ways to simplify this: how does one packet get to another destination, maybe outside of the core network. And then L7, now we start to get into the protocols we’re all used to, like HTTP, where we think about posting a request to some endpoint. That’s great. So when we think about securing those layers, you’re gonna do different things for each layer. So at L7, we’re going to be talking about things, like TLS certificates, SSL. We’re going to do things, like we want security to help us with deciding: if you’re trying to make a request at this path, how do I identify you? And then how do I reject you in a way that makes sense? Do I send you back a 500 or do I send you back a 403? Those are things that you just know at the networking tier based on their protocols that you’re dealing with. But guess what, it’s still IP addresses. They’re still routes. TLS is not necessarily brand new. So doing TLS mutual auth to encrypt between your two applications and then give them an identity that you can actually apply policies on, these are fundamentals that were true 20-30 years ago. I think in that case, make sure you learn about authentication and authorization, and why those are different. If you understand those fundamentals, then when you look at something like Istio or service mesh, you can look at how they try to represent those concepts in their own systems.

Henry Suryawirawan: [00:20:56] Yeah, I think that your message is definitely resonated with me as well. So focusing on the fundamentals. They tend not to change, especially for long years. We can go back, not just for infrastructure, but also application design, like design patterns, and also integration patterns, and things like that. They tend not to change so much. But the tools and the implementations seem aplenty these days with all the frameworks and the languages popping up here and there.

Impostor Syndrome [00:21:19]

Henry Suryawirawan: [00:21:19] So another aspect of breaking through in tech, I feel is that there’s this thing called impostor syndrome. Especially, as a junior person coming into the tech industry, with so many technologies that are available these days, either open source, either cloud, either also proprietary. So there seems to be insurmountable amount of technologies that someone needs to understand first of all, or know the general principle, and also understand about the implementation of it, the details. These things tend to put someone down in the first place. In a sense that, “Oh, there are so many things. I’m probably not good enough to learn all these things.” So what are the…, maybe suggestion or advice from you of dealing with this impostor syndrome?

Kelsey Hightower: [00:22:00] I think there’s probably way smarter people who’ve written about this in general. So I’ll try to just speak from my own experience. At some point in my career, I realized that I didn’t need to be the best in every piece of technology I was going to be responsible with. I knew I needed to understand it to the degree that my job required. And if I thought it was going to make sense, I could go a bit deeper than the job required as an investment in myself. That was the foundation of my career that I at some point arrived to. So what does that mean? That means that I can decide what technologies are important to me and my skillset that I want to be able to put on the market or leverage for the things I want to do. ML is hot right now. I have very little interest in learning TensorFlow. Right now, someone could say, “You’re missing out. That’s the hottest job prospect in the world, etc.” But that’s not for me. Maybe I’ll touch it every once in a while just to see what’s going on there. Maybe do a “hello world” type of tutorial. And then I’m okay saying that is not my area of expertise. There’s only so much time in the day. Now the areas where I am interested, I take on the responsibility of going as deep as I can, for example, I like the Go programming language. But I’m familiar about how it parses its syntax. I’m familiar about the surrounding ecosystem. I understand how Goroutines work. Understand some of the trade-offs you make in the Go programming language when you’re making system calls to the kernel. Because I also deal with things like containers. And containers definitely get low level. And so for me, I’m very patient with saying, “I’m investing in myself. And I’ll pick and choose the right things to make that investment in.”

And in terms of impostor syndrome, I think there’s two people who feel this. There are people who are truly impostors, meaning you’re trying to be something you’re not. And whenever you do that, that can also make you feel very uncomfortable, because you’re pretending to be something you’re not, around people who might be. And that’s not necessarily a person who knows the most. But if you try to force yourself to be one of the people who knows everything, then you’re going to trap yourself in this feeling. I’ve resigned to saying, “You know what? It’s okay to learn in public.” Since I don’t know everything, I’m comfortable with asking questions. And I get it. Sometimes you’re going to get penalized for not knowing everything. There’s some companies who, or some organizations or teams who practice a very unhealthy practice of: anyone who asks a question, we’re just going to believe that they’re dumb and no longer ask them to do anything. That is just unhealthiness. What I was talking about earlier, we don’t have a formalized set of training to say we are going to invest in people continuously. So for me, I decided I’m not going to pretend I know everything. So therefore, I’m not worried about being an impostor. And therefore I don’t have to worry about trying to be the best at everything. So I stay focused in the areas that I am interested in, and take the responsibility just to get slightly better, day over day, year over year.

Henry Suryawirawan: [00:24:44] So in the first place, knowing what you want needs a certain kind of conviction. For some people, this tends to be easy. What you’re good at, for example, I’m good in application development. So I know what I need to focus on, maybe programming language, maybe some frameworks. But for some people, especially those who come from non-tech background, they might not have such a conviction. They might just see from the news or see from the job careers, what are some of the hot things to do? So is there any things, maybe how should you advise people to have that kind of conviction?

Kelsey Hightower: [00:25:15] If I think back to high school, I remember you had elective classes. So outside of math, science, maybe English, you had to take electives. And this could be things like home-ec where you learn how to cook and maybe sew cloths. I take home-ec by the way. There’s technology class. There’s sports. There’s all kinds of things you can do in that elective. And the point of the elective is to expose kids to more things than they would probably naturally pick on their own. So it’s about exposure. So I think when people are starting their career or making a career transition, I think it’s healthy to try a little of everything, because there’s no way for you to know. What you’re hearing from other people is things that may have worked for them. But it may not necessarily work for you. So I think one thing is, “Hey, maybe try tech support.” Maybe you like helping people in a way that allows the support people go into. And don’t think one role is inferior to another. I think that’s another mistake people make. I started my career in tech support. And even within tech support, you can go super deep. And I think tech support is a great path to Site Reliability Engineering, SRE, or even software development, especially if you can understand how to troubleshoot and debug these systems. The next area I think people have to look at here is, once you try a little bit of everything, and something resonates with you, that might be a period of your life. We always talk about the T-shaped engineer. So you might say, Hey, I’m really liking networking. For some reason, it motivates me to want to go deeper. You know what? For the next one to five years, maybe you want to be a common network engineer, go get all your certifications, work in an industry where you can actually improve your skills around networking. And guess what? It’s okay to switch. Maybe you go from network engineer to someone who write code for networking systems. And now you’re shifting to software engineering. And maybe you like Python. That’s another opportunity to go super deep on Python. And then maybe after 10- 15 years, you step back. You’ve gone deep in several areas. And now you have the top of the T, you have the horizontal set of skills. And maybe at that time, you’re super deep, and maybe you go back to network engineering where you combine all your skills and go deeper than you were before.

Henry Suryawirawan: [00:27:17] Thanks for sharing that, Kelsey. I think there are few things that I could pick up here. So first of all, is that if you don’t know what you’re good at, or you don’t know what to pursue, try a little of everything. And also don’t think of one role as inferior to the other. This applies not just to role, but maybe also to technologies. And also when you find something that you are resonating with, you can go deep. And the last is that, it’s okay to switch. You don’t have to self identify yourself with a particular technology or particular role. So at one point in time in your life, if you feel like comfortable switching to another role, I think it’s okay to do so as well.

On Cloud Latest and Cloud Native [00:27:51]

Henry Suryawirawan: [00:27:51] So Kelsey, you have been working in the cloud landscape for the past… I don’t know how many years. It’s a long time now. What are some of the possibilities in the future when you talk about cloud?

Kelsey Hightower: [00:28:01] Yeah, cloud is interesting. Cloud is infrastructure for most people. And if you think about infrastructure, what’s possible? If you think about airports, then it becomes possible to travel other parts of the world in a more accessible way, and for a lot of people more affordable. A lot of times, infrastructure is going to enable us to do things we can’t do by ourselves, or with limited set of resources. So when I think about cloud, and we talk about public cloud, there’s a lot of computing tasks, where maybe you want to start an online e-commerce site. And these days, people expect that site to be available in all the countries, high availability, it’s up all the time, and they want to be able to use any form of payment that they want. And the goal of the cloud is to enable that as seamless as possible. So when we think about the cloud, I think we’ve got a good lock on infrastructure components, like databases and compute, networking, some security layers. So when we think about the friction that’s left is we’re asking people to understand all of those things before they can go out and build something. And that’s the opportunity for the cloud. So this is why I’m excited about things like serverless, where we try to abstract away as much of that underlying infrastructure as possible to get people closer to: “Here’s my idea. Here’s the code that powers it. Run it for me.”

We have a lot of work to do in terms of user experience there, right? We still expose too many configuration options, when most people are just after the best practice. So if you’re going to create a database, you may not understand all of those hundred options that are available on Cloud SQL. What you want is: hit where’s the button that says best practice for what I’m doing. That’s what most people want to do. So I think cloud should be something where you can bring your ideas. And when you have an advanced use case, of course, you can always drop down to the lower level infrastructure, and kind of build a platform that you need. But as cloud providers, we should be trying to make the 80% use case as secure and easy to adopt as possible.

Henry Suryawirawan: [00:29:57] So when adopting cloud, these days people also tend to talk about being cloud native. So can you explain to us what is cloud native basically?

Kelsey Hightower: [00:30:05] Yeah, that’s a fun one. Some people would say cloud native is a 100% in the cloud, and leveraging the patterns that were born in the cloud. I like to simplify this a little bit. If you think about the patterns around observability, having structured logs, in a way that instead of logging that there’s just this error, you’re going to log that with some context — this particular package of service is having this error. Here’s the client that called it. Here’s what I was doing. And to give you more insight to go along with that log message, we may even put in a request ID. And that request ID could have been generated from the client, or the HTTP load balancer that was sitting in front of me. And I can then take that request context, and put it into a HTTP traces, so you can know how much latency between the various services. Because you’re thinking about these distributed systems, you’re gonna need a lot more observability than you have before, right? Because these things are a lot more complex. Again, this is a pattern born in the cloud, where we’ve made it really easy to get machines across the globe. So in order to make those things easy to troubleshoot, we need things like observability, and not just logging plain text to some file, we need a different kind of pattern. And as you look around the whole cloud native set of patterns, right? So when you say cloud native, I think of patterns. Think about health checks. Instead of running some bash script to say if application is healthy, that won’t necessarily scale to the cloud model. I’m not talking about just a bunch of machines. I’m talking about having things across multiple zones or regions, or that could go away at any time. Because in early cloud, there was no guarantee your virtual machine would run forever. So you need a better patterns around health checking. So you had some other tool that can come by and check the health of all your endpoints. And if one were to go away, it would then be able to automatically provision another.

So to me, it’s a collection of patterns that were born in the cloud. I think most people would do a good job of deciding which of those patterns benefit them the most, because you can actually apply some of those patterns, even outside of the cloud. Maybe you’re running a couple of servers in your own data center, you might still be able to benefit from things like standardized health checks and metrics. And then the last thing I’ll say here is that a lot of these cloud native patterns now are no longer ideas that you find in a white paper. They’re now open source projects that you can find in the CNCF, so the Cloud Native Computing Foundation, is a home to a lot of these open source projects where the maintainers and the communities can come together and collaborate and push a lot of these standards forward.

Henry Suryawirawan: [00:32:33] I do hear some people even say, like for example, cloud native is if I use all the particular cloud provider’s products, so then I’m cloud native. Is that a misconception or is that something that is true? In what sense? Maybe you have a take on that?

Kelsey Hightower: [00:32:48] If you use all the cloud provider’s products, I don’t think that automatically qualifies you for cloud native. And I’ll give you an example. Let’s say you have an app that just runs on a set of virtual machines behind a Cloud Load Balancer. Well, you could have done that on-prem. You could do that with VMware. You could do that with OpenStack. You don’t need to adopt any of the cloud native patterns to make that work. And typically that’s referred to as the lift and shift. Take what you were doing 10-20 years ago, and just do it in the cloud. And the cloud supports that via Infrastructure as a Service, IaaS. That, to me, isn’t necessarily cloud native. Now there’s a lot of value into leveraging cloud services, and I think nothing’s wrong with that. But I think when we say cloud native, we’re basically talking about a world that allows you to effectively leverage those cloud services in terms of resiliency, reliability, observability. And to get all of those things, this is where we start to delineate or distinguish what’s a cloud native set of patterns that go along with that. And in many ways, when I really think about it, a lot of these cloud native ideas or concepts are really at the application tier. So for the first time, we’re now focusing on the relationship between the application and the infrastructure. And that’s where a lot of those patterns are to be found.

Twelve-Factor Application [00:34:00]

Henry Suryawirawan: [00:34:00] So you mentioned about patterns. And a lot of people also associate cloud native with the Twelve-Factor application design or principles. Do you think that Twelve-Factor is a good representation of all the patterns?

Kelsey Hightower: [00:34:12] Oh, this might be an unpopular opinion. But my answer is no. I think Twelve-Factor was a great precursor to allow you to practice some of the patterns that you find in cloud native. So a lot of people had come from this world of PaaS like Heroku, and Heroku was this kind of infrastructure or PaaS that would say, “Look, give me your source code, but here’s some restrictions. In order to be able to run your application on any of our servers in a way that’s portable for us, we’re not going to allow you to do things like have a local volume for storage. We’re not going to be dealing with configuration files and copying of files all over the place. So now you’re going to have to take things like in an environment variable.” And a lot of times, these are really nice ways to make an app a little bit more portable, because you’re no longer requiring a file system. You’re being very explicit about state. You make sure that if you do have persistent data, you put in a persistent database explicitly. And you also build your apps in ways that are easier to start up, that can be rebuilt.

So I think a lot of those patterns, or just really good software engineering patterns that allowed us to actually run applications at scale, and a platform such as Heroku or App Engine or Cloud Foundry. So I think it was great for those. Remember, those patterns came out what? 10-15 years ago. Nothing wrong with those. But that only gets you a slice of some of these cloud native patterns. And the reason why I think they apply just a little is because we’re talking about documenting and formalizing practices over the years. So Twelve-Factor is a good place to start. But here’s the thing. You can have cloud native applications that are not Twelve-Factor applications. Like cloud native could be things like Kafka. Or it could be something like etcd. etcd definitely uses storage. It definitely doesn’t use environment variables. So it probably violates maybe four or five of the Twelve-Factor principles. But that doesn’t mean that it’s not cloud native. Cloud native is not about stateless only. I think stateless applications are a little bit easier to adopt some of the cloud native practices. But I don’t think we should exclude stateful applications because they’re not Twelve-Factor.

Serverless Latest [00:36:14]

Henry Suryawirawan: [00:36:14] That totally makes sense. Kelsey, you mentioned that you have been dealing a lot with serverless technologies these days. So what do you see the trends coming in the serverless area? Are there some technologies or maybe things that everyone should know about?

Kelsey Hightower: [00:36:26] When I first learned about serverless, I learned it through maybe AWS Lambda, function-as-a-service, chaining events together, and these workflow engines that we lay on top. That was a really nice way to think about this problem. It wasn’t necessarily a brand new way of thinking about this problem, because a lot of people have tried and have done successfully event-driven architectures in the past. But when I hear serverless, I think about the operational model that comes with it, which is how do I reduce as much friction as possible? And when you think about the word serverless, we’re talking about a couple of things. One, we’re talking about the application level, the concept of a server. If you’re going to write a web app, you typically embed a web server that you get from your framework. And that web server has to buy into an IP address. It has to think about logging. If you get to the lower levels, you have to think about some of the connection details and timeouts. There’s a lot that goes into running a server, even outside of your application logic. And then when you add in the underlying server that needs to be patched, it needs to be updated, it needs a network. There’s a lot of friction there. So if you look at Lambda, it removes all of those things, or attempts to. And what you’re left with is a function. Your business logic will be invoked whenever there’s a data packet or something that you need to request or an event. And what I think we’re trying to do now is say, “Hey, can we not give the same operational model to normal containers?” Where you do decide that you want to packaging your own web server or some other protocol that makes sense for you. So I think one thing we’re trying to do now is on the cloud provider side, we’re asking ourselves tougher questions and trying to meet a higher bar of usability. So what I don’t want to do is tell you to rewrite your application. The perfect state would be keep your application as it is. Now, if you do certain things like enable health checks or support metrics, it will definitely help you, because there will be no server for you to log into to troubleshoot. So you may want to add those things. But if you don’t, I still want to give you the same operational model, meaning I want to be able to scale down to zero. So when you’re not using this application, maybe in dev or QA, then you don’t pay for it. If you want to run over multiple regions, you don’t have to learn about all of this multi-region networking, and it gets very complex at those layers. And then I can also promise you a bit of security, because I can keep the servers patched underneath you. And that contract becomes a little bit clear. So I think what people should be paying attention to is think about all of your favorite tools and services, whether they’re open source databases like Postgres and MySQL, or if you’re writing containers, regardless of their framework, Ruby on Rails or Spring Boot in the Java world. Imagine just being able to tell the cloud provider, “You run those systems for me. And now I just give you the code and configuration that I want to leverage with them.”

Henry Suryawirawan: [00:39:06] What do you think the current state of serverless? Is it good enough for people to use and that there are technologies on it? Or is it something that still needs a lot of improvements?

Kelsey Hightower: [00:39:16] I think it’s super mature depending on what you want to do for certain workloads. So if you’re doing event-driven architecture, Lambda from AWS, Cloud Functions from Google Cloud, those are super mature, and tons of people are already using those things in production at pretty decent scales. So this means that you’ve written code that executes fast, understands how to deal with the events, and the ecosystem around that is also fairly mature. A lot of the data brokers are doing a good job of retries and giving you visibility into how the execution flow is going. And then there’s configuration tools that help you to deploy those kinds of systems. So I think for event-driven architecture is fairly mature, especially when you’re being very explicit about state.

On the container ecosystem side, I think our goal should be: can we provide the equivalent experience that you get with a virtual machine. For example, with a virtual machine, you can mount a data volume. You can do NFS. You can do machine learning with the GPU. You can do all of these things when someone gives you a virtual machine. In the serverless world today, if I think about Google Cloud Run, we can do about 70% of the thing I just talked about. We could do NFS. We can run any Docker container. You can use any library that you want. But there’s some challenges around doing things that maybe you want to run Docker inside of that thing. Well today, that won’t be necessarily easy to do. Bi-directional gRPC is also something that’s coming down. So you have to think about the protocol that you use. We support WebSocket and gRPC, but we may not support every single protocol that you’re used to when you’re dealing with the virtual machine. We got to close that gap. And once that gap is closed, then it becomes a trade-off. How much control do I want on the underlying infrastructure and not about what you can and can’t run. So we got work to do.

Henry Suryawirawan: [00:41:02] Are there some other technologies apart from Cloud Run upcoming in the serverless area?

Kelsey Hightower: [00:41:07] So when I say serverless now, I’m thinking about the operational model. Serverless can be applied to more of our data stores and databases. Typically they’ll have a free tier. They’ll scale to zero. Pay for use. We also have things around workflows. And for building things, there’s Cloud Build. A lot of people like… I don’t run Docker on my machine anymore. As of, maybe 2.5 years ago, I just don’t run Docker locally. I just use Cloud Build. So I still keep Dockerfiles, but whenever I want to build them, I just have a build script that just says “gcloud submit” to the Cloud Build system. And then there’s a container that’s living in some registry in the cloud, and then I can actually complete the rest of my workflow. That to me is another concept of serverless. Just more managed services, where I don’t really have to do all of this work. And we’re trying to do that with control plane. So if you’re thinking about service mesh, today inside of your clusters, a lot of people will install Istio, and all of its control plane components. But one thing we’re trying to do with tools like Traffic Director is say, “Hey, what if we were able to centralize this open source control plane? Same open protocol. Support standard Istio. And Envoy proxies. Even bring your own. But then that just runs it a more managed environment.” To me, that can also be seen as a form of serverless.

So I think for us, the goals going forward is: imagine taking as much of our managed services as we can, and giving people the option to leverage a cost model that’s pay per use. Today with a VM, you create it and you pay by the hour. But we want a world where you only pay when that thing is doing something. So that’s a goal that we have. And that could be Cloud SQL. That can be Postgres. That can be Cloud Spanner. That could be Memorystore. I just want to see it everywhere.

Monolith vs Microservices [00:42:44]

Henry Suryawirawan: [00:42:44] Interesting take indeed. So Kelsey, let’s switch to another topic which you also are passionate about, which is about monolith versus microservices. So I want to go back into a tweet that you posted somewhere around late 2017. You wrote something that in 2020 prediction, the monolithic applications will be back in style. What do you see about this prediction?

Kelsey Hightower: [00:43:04] Yeah, that was more of an observation. Because people had already been going back. Or some people have never left. I have an example. I wouldn’t say it’s a good example, but it’s an example. Istio is this service mesh that we’ve talked about a few times here. And it had a few components. It had what we call the Mixer. This is where you would send all your metrics and logs. And the Mixer would decide, does it go to Datadog or Stackdriver? It was pluggable and configurable. We had a thing called the Pilot. It was an independent service that will integrate with things like Kubernetes. We had things like the Galley. There was just like all of these small little services. Then when you start Istio, it was also leveraging microservices itself to help you manage your microservices.

But guess what? The biggest complaint we had was it’s really hard to deploy all of these components, configure all of these components. And most people wanted something a little easier to manage. So guess what the solution was. The solution was to take all of these components, and make them to a single service. So they went the other way. They went from microservices to a monolith. And this is not a knock against them. It was a good idea because you still can deploy the individual components if you ever need to scale. But the fact that they did that to simplify things. I’ve seen this come up a lot. If you look at some of the engineering blog posts from — I won’t mention all the company’s names, but they’ve done these blog posts where they say, “Hey, we used to have 20,000 services. And we noticed that we were spending a lot of our CPU, serializing JSON, or tracking metrics, or applying security policies instead of doing actual work. So instead of doing that, we’re going to go from 20,000 back to 2,000. And things are faster. Things are easier to manage. Things are easier to understand. And I think in those cases, it’s not that microservices are bad. It’s just that when some people who are leveraging this pattern unnecessarily, or too far, it became unmaintainable and might even make performance worse. It’s the trade off that you should make when you’re doing it for organizational purposes. So if you’re at Google, you might have a Gmail service. And even within the Gmail team, there might be independent services. But that’s where I see it being super effective. When it helps you align your organization around areas of specialty. But even then in the monolithic world, you can achieve a lot of these things with a monolith. If you’re just a solo developer, you’re probably better off just writing the monolith to start. If you’re in a small team, and maybe have some good engineering practices, like writing modules that then get compiled together as a deployable, again, monoliths may help you go very far. And the last thing I’ll say on this is, for most intents and purposes, a large majority of Facebook could be considered a big monolith. The same is probably true of GitHub. I’m not saying they don’t have any microservices. I’m saying that you can get real far without them.

Henry Suryawirawan: [00:45:46] Interesting take actually. I also heard about the Istio project. How they started as a microservices with all the components that you mentioned just now. And with the recent version release, they go back into a monolithic design. So certainly there’s some wisdom on why they are doing it that way. Although some people now are still trying to adopt microservices for their architectures. So it seems almost like the de-facto standard, I would say in the industry like, “Oh, I want to build something. Let’s start with microservices.” So what’s your take on that? Is there any rule of thumb for you to decide when to use microservice or monolith?

Kelsey Hightower: [00:46:19] Yeah, I think I would probably start with a monolith whenever I can. But remember, we’re not talking about a monolith with no engineering discipline. I think when people say monolith, they’re really describing in some cases, not all, in some cases, a lack of engineering discipline and standards. Because a lot of times, if you have modular code base, meaning all the services are in their own repositories, and they have their own workflow and release schedules and integration test with the rest of the other services at a package level. We’re not talking about deployables yet. We’re still just talking about modular packages. Just like the standard library, and maybe different libraries you use to create your own services. When those things are modular, and there’s clear ownership and good testing, then you can make a decision. Do you deploy all of those things individually? Or do you import them all into the same main binary, and then wire them up with some routes? And that can actually be as maintainable or scalable from a team concept as having independent deployable. So when you separate those two, I think you can say, “Hey, even if we start with a monolith, we’re still going to adopt some of the practices that you find in a microservices architecture. We’re going to split our services into different repositories. We’re going to have clean interfaces for invoking those.” I’ve seen people go as far as making a monolith and then having all the services have to call each other service through localhost. You can see that in some of the old JBoss world or some of those big application frameworks. Everything is a web request, even when you’re deployed together in a single binary. So you don’t necessarily have to get that kind of discipline in isolation, just from microservices. And then if you develop this way from the beginning, when the time comes to split one of those components out, you’ll be able to do it very naturally, because the boundaries will be very clear about how you do it.

The last thing I’ll say here is for people that are struggling, like, “Argh, I really want to do microservices for my resume. Or I want to make my LinkedIn profile look really good.” If you have a monolith today, one thing I always highlight to the team that’s thinking about making this transition. I say, “I guarantee you. You’re already doing microservices today.” And they say, “No. Nope. Kelsey, we’re doing monolith. Trust me. I wrote the code. I know my company better than you do.” I said, “Okay. How do you do DNS?” They say, “My library calls the DNS server based on this configuration. Why you asking?” “That’s your first service, right? Domain Name Service. It runs over here. It’s usually a separate binary. And it does one particular thing. And all your apps call on that service to do name lookups.” And they look around and say, “So you mean we’ve already been doing microservices?” I was like, “Yeah, there you go. You have microservice architecture already.”

Henry Suryawirawan: [00:48:52] Interesting story, indeed. In your definition, this also is confusing for people, right? What is actually microservice to Kelsey Hightower?

Kelsey Hightower: [00:49:00] If you don’t compare the two, then I think it’s easier to understand. I worked at a company that had a lot of stuff written in Python. Everything’s written in Python. That was the standard language. And then we had to do some work where there’s only Java libraries available. The thing we needed to integrate with, there was only a Java SDK. So we had to make a decision. Do we write our own integration and maintain it in Python? Or do we leverage the Java mature SDK? And we decided to do that integration work using the Java SDK. And naturally, that forced us into a separate deployable and a separate service. So now we have this big monolith over here. And we have this one new service over here that’s written in Java. But it exposes an HTTP API, so we can integrate with it with our Python monolith. That makes a lot of sense. And then someone says, “Hey, Kelsey, we need to process some batch files.” Again, I revisit this decision. Should I really add batch processing into the monolith? And then have to deploy all of that monolith just to do batch processing? My answer would be no. So in that case, even if I choose to stick with Python, I might start a new binary or a new project or new service that is written in Python, but it can be deployed independently for dealing with the batch process. Now there might be code in the monolith that I need to reuse. So then I have to make another decision. Do I call the monolith? Or do I import the same libraries that the monolith isn’t using inside of the thing doing batch processing? And so when I think about microservices, I tend to think of: there is a pattern for deciding when to keep all the logic together and when to split it apart. And you can do both at the same time, at the same company, on the same team.

Henry Suryawirawan: [00:50:41] Yeah. I think it makes sense in that sense. But I would try to also think of it like it’s a service oriented architecture anywhere. Is there any particular reason why there’s this “micro” in the prefix of the word?

Kelsey Hightower: [00:50:52] So when I was first introduced to like, SOA, Service Oriented Architecture. It was actually via the Java ecosystem when I was dealing with JBoss quite a bit. The Java ecosystem to me did a decent job back then of saying, “Even if all of this stuff is going to end up in the same WAR, EAR file, there’s a way to make sure that we draw clean boundaries over each service.” So even though we may have deployed it all together, it was a clean way to say this service is for the user database. They’ll have this particular context. This server’s for dealing with the data domain. You had all these ways of deciding, “Hey, here’s the various area so that different people on the team could work on a different service.” Maybe that’s in the form of a JAR or whatever. And then at build time, we would take all these services, and we’ll push them together, and we will create a release. And then that release, we can drop into a web application server. And then that thing would deploy all the services. And then, hook them up in a way that they can either call each other directly, or they could call each other over HTTP, depending on the framework you were using. So that kind of services architecture made a lot of sense.

Now, the drawbacks of that, in some cases, were when people start reaching inside of a service inappropriately. You’re not using HTTP as an interface. You may be going behind the scenes. Start calling private methods, or you make the private method public, so you can call it. And now you’ve got the spaghetti code. And it’s argh! Things are so bad that you just say, “You know what? Let’s never allow this to happen again.” So when people say, “Look, if the monolith was too big, if we regulate ourselves to make things small, that might be one way of preventing those past mistakes.” So it’s like going from a 10,000 square foot mansion. You live in a huge house with 20 bedrooms. And you say, “You know what? I have too much furniture. The whole place is dirty. I’m just going to move into a one bedroom apartment to make sure that never happens.” But guess what, if you still have the same practices of not cleaning up after yourself, there’s a huge chance that your one bedroom apartment will also be messy over time. So I think the idea with the microservice would be: let’s try to split things up in a way that we can be very clear about our intention. The user service only does user service stuff. And when you want to do something else, you have to go to a whole different repository, a whole different way of thinking about it. And we can then catch if the things are getting too big. If I start seeing the user service handling the data model for the web catalog, then I can say, “Hey, Whoa, that doesn’t belong over here. This thing is getting too big. It has too many responsibilities. That’s how we got in trouble last time.”

Now, all this is a good idea when you start to thinking about different departments, where there’s a team that only deals with user management. It might be nice for them to have a service that’s scoped well for their domain, aka micro. But yes, to your point, the word micro is debatable. Should it be maximum 1,000 lines of code? Should it only have five HTTP routes? That’s a debatable thing that I don’t know if it makes sense to try to regulate yourself to a certain size. But I think it has to be around a certain domain. And I think that’s what we’re after.

Henry Suryawirawan: [00:53:47] I like the story where you use the analogy of mansion and to one bedroom. I think habits never change. And I think, yeah, the term micro is very confusing in the world these days. Like sometimes, I tend to speak with few technologists, or sometimes even friends talking about microservices. They will say, “Yeah, I’m doing microservice.” But all I can see is “Okay, you almost like just splitting your monolith into just three different services. And call them microservices.” So to me, that’s confusing. And I like the responsibility of the domain thing that you explained. Maybe we should even call it responsible service. But that’s probably for another time.

Learning Things The Hard Way [00:54:20]

Henry Suryawirawan: [00:54:20] So Kelsey, you also do a lot of things around Kubernetes, right? Including even in the beginning, where people haven’t even heard about Kubernetes. And sometimes the containers as well. And also you wrote a book about “Kubernetes Up and Running”. You also did “Kubernetes the Hard Way”, which is your GitHub repository about learning Kubernetes the hard way. And I also heard that you are doing the “Service Mesh the Hard Way”. So first of all, what are some of the lessons learned from doing all these things the hard way?

Kelsey Hightower: [00:54:50] Yeah. So remember we talked about going deep. I remember when I first got involved with the Kubernetes project, I was working at CoreOS. And we knew this project was going to be coming out. Red Hat and Google were big partners. And maybe a few other companies. But CoreOS wasn’t really contributing at the time. There’s going to be a press release that was gonna come out the next day. And I remember getting early access to the repository. And I’m looking at this things like, “How do I even try it?” I don’t even see the documentation for running this on a system that I have at that time. It was my MacBook with VMware Fusion running on it. So I have to spend that night just really learning what the kubelet is. What does it do? What are these flags even mean? What is the scheduler doing? How does it work? Where should it run? Going through that process, I really learned all the individual components of Kubernetes. How to compile them. How to deploy them. And I remember just experimenting with the configuration till I got something that could actually run a container. I remember publishing that blog post with screenshots. Here’s how you run Kubernetes on VMware Fusion. And I remember when I was going through the code base, I was like, “Wow, a lot of this Go code looks like Java.” We used to call it, Gava, Java with a G. I used to contribute early improvements to try to clean up the code base a little bit. And also maybe add a few missing pieces. So some earlier work that I was doing was thinking about how to automate node membership. Because at that time, you would have to bring a node in. And then just generate the config and connect manually to the API server and then register the node yourself. So auto node registration was one of the first things that I worked on, and add a support for CoreOS to automatically provision nodes and joined them to the API server. Little contributions like that around. Did a little bit of work on CNI. But what I’ve learned from then documenting was when people would say Kubernetes is hard, right. I will go and do a lot of these workshops. I’m super excited in those early days. I’m like, “Who wants to learn Kubernetes?” I could be walking down the street. “Hey, what are you doing? You want to learn some Kubernetes?” And then I would just sharing all the knowledge I had with people, cause I was still learning. And this is what I call learning in public. As I’m getting excited, I had all of these avenues to share my excitement, whether that was a keynote stage or a YouTube video, or my text editor adding some new functionality. And I remember when we were asking ourselves, “Why are people saying Kubernetes is hard?” My conclusion was they don’t understand it. It’s not about having the easiest bash script run that gives you a Kubernetes cluster in five minutes. It’s that even if you ran the script, most people didn’t know how to maintain the cluster. They don’t even know what components are running there. So I decided early on, maybe a little later, when I first joined Google right before, that instead of trying to build tools and make everything easy, how about we make it easy to understand? Not necessarily, easy to provision. And we ended up doing both. So kubeadmin came out around the same time. But I also wrote “Kubernetes the Hard Way”, which was stepping people through all the underlying tasks — getting the binaries, provisioning the VMs, creating the SSL certificates, copying them to the servers, and wiring everything up. And then what we found was lots of people around the world were like, “Wow, I finally now understand all the moving pieces.” And ended up gone through it maybe once or twice. It’s still complex, but it doesn’t necessarily need to be hard once you understand the components. And remember, this is written towards anyone that has an operations mindset or wants to learn how it works at the infrastructure level. That was for them. It wasn’t meant for developers. Even if you’re a developer and you’re curious, it wasn’t to try to make your developer workflow easy, but for people responsible for managing a cluster, it was from them. So what I kind of learned from this is sometimes the easiest way to make something easy is not necessarily make it easy to use, but make it easy to understand.

Henry Suryawirawan: [00:58:17] And is that also the motivation why you’re coming with the “Service Mesh the Hard Way”? So anything that you want to learn more about service mesh?

Kelsey Hightower: [00:58:25] Yeah, you’re exactly right. Again, learning in public. I started learning things about Istio, maybe two years ago, maybe two and a half. I even gave a few key notes about Istio. When I was looking at it, at first when Istio like a black box, like a lot of people. I installed it. It has a config interface. I can configure things like encryption between my services, rate limiting. All kinds of nice high level networking policies. And this is great. When I started to really look at the code base, one of the first things I contributed was a prototype for the Sidecar injector. So for people that use service mesh, typically you will deploy Envoy next to your application. And then Envoy will do the heavy lifting in terms of making sure that the policy that you put in Istio would be enforced by the Sidecar Envoy, and Envoy all traffic will go in Envoy, and all traffic will come out of Envoy, but you need to configure your deployments to do that. So the injector would basically rewrite your deployments automatically before they got scheduled to a node.

And then when I started to really understand Istio, I said, “Wait a minute. If I look at the Istio control plane and its high level config language, what is it actually doing?” And looking underneath the covers, I was like, “Wow, this thing is basically just generating a config for Envoy, including the SSL certificates to do this encryption between the services and just pushing everything down. How’s it pushing it down? Oh, there’s the XDS protocol.” Envoy has a way of streaming configuration. So as things change in the infrastructure, you can keep the policies and the backends up to date. And I was like, “Wow, this is intriguing. But what about everything else? Like how do you do authz?” Once you do get a service that connects to you, how does Envoy handle dealing with whether you’re allowed to do this thing or not? And again, you need something like Open Policy Agent, where you can say, “Craft a set of policies that says this service ID or this SPIFFE ID can call these paths.” And then how do you log all these things? Where does Prometheus fit in? So I looked at that. And so that’s a lot to understand coming from the top down. So I decided to start working on this from the bottom up. What I’ve done is, to set the stage for Mesh the Hard Way, is, okay, let’s start with just a set of apps. Actually, I start with a monolith, where I do everything inside of a single binary. It’s a very simple calculator app with an API. So you call the API. It will call it the add service, subtract service or multiplication service. And it has a little bit of authentication before you can. And then I split that up with no visibility, no logging, no tracing, and no security, and just have a bunch of little services. And then I posed the question, “How would you lock this thing down?” And that’s when I start introducing things like Envoy, start introducing things like Open Policy Agent, Prometheus, Zipkin for tracing. And I start adding these things one by one, so that people can understand the role of a service mesh, and how it all works under the covers as they build back up to their own service mesh by hand.

Henry Suryawirawan: [01:01:13] Wow. Really interesting. So how far before we can see it in public or is it already available?

Kelsey Hightower: [01:01:19] That’s available to me on a private GitHub repository. All the code is written. All my configs are written. There’s thousands of Envoy configs in there. So now what I gotta do is get the narrative. If anyone’s seen any of my latest talk, you see me tease a little bit. I’ve been doing little sections of it as little live demos. I think a couple of days ago, I did one for the Production Identity Day for the Linux Foundation and the CNCF, right before KubeCon. And there, I showed a little bit about how to create your own SPIFFE IDs and JWT tokens yourself from scratch. All of that stuff will be covered. So for me, it’s going to probably take me about three or four more months. Given that, I’ll probably release it in 2021. Maybe I’ll do it around my birthday, which is in February, as a present to myself to deliver it. And this year, or this time, I want to add a lot of diagrams. One thing that “Kubernetes the Hard Way” doesn’t have is any diagrams. And I think a lot of people would like me to go and just say, “Hey, add a few visualizations and diagrams so we know what’s going on each step of the way.” And I think I’m going to do that this time.

Kubernetes-ify Everything [01:02:15]

Henry Suryawirawan: [01:02:15] I’m looking forward for that, for sure. So Kelsey, another trend that I see in Kubernetes world or people thinking about adopting Kubernetes, it’s actually to use it almost for everything. These days, we see people create their own CRDs, Custom Resource Definitions. Or you see a lot of, Kubernetes related tools and frameworks, even for solving workflows. And the more of it actually, I see the more these tools will keep popping up. So what is your take on doing everything the Kubernetes way?

Kelsey Hightower: [01:02:41] So one thing that I’m really happy is that the Kubernetes project really decided to draw the line around what’s core functionality and what’s not. So core functionality in Kubernetes is going to be some of the components, like the scheduler, things like RBAC so you can control who can write what configs, and who can read what configs, and configs would be things like your deployment objects, your secrets, your config maps. And then we have some objects that are also considered core, like a replication controller. So when you say, “Hey, create me a deployment.” Those replication controllers are the thing that make sure that you have three or five pods, depending on how you configure it. But then we said, “Okay, outside of that core stuff, what else do people want to do?” And most people want to create their own object type. So instead of just the standard collection of deployments and config maps, etc, they want to create their own things and deploy their own control loops. So the nice thing about Kubernetes is, if you step back from Kubernetes, and you don’t install the agent for running containers, the kubelet, and you don’t do any of that, what are you left with? You’re left with this basically, control plane framework. It has roles and permissions. It can generate routes for you. It can generate client libraries for you. It can do all of these things. It’s like what we call the API machinery. And so what API machinery does is it allows you to also create new API endpoints without writing all the code to do all the other things. And so the contract there is: given this universal control plane, you can create your own API through a CRD, Custom Resource Definition, and maybe that control plane does… I don’t know. Let’s call it fly an airplane. Some reason you might want to do that. So you might create a resource definition that says, “Tell me the plane. Tell me a starting point. And tell me its destination.” So with those three things, you may design a CRD that catches that. And also in your CRD, you also talk about how it should be presented. So when I say ‘kubectl get airplane controller’, then you’ll see them listed out. So great. So you have all the client-side, server-side stuff done for you. So then what’s left? Then you can create your own control loop. And here’s the thing. A lot of people get confused. You don’t have to deploy the thing that flies airplanes, that control loop inside of Kubernetes. You can deploy that on a VM. You can deploy that on the serverless platform. Because Kubernetes doesn’t need everything to be in a container. The only thing you have to be able to do is communicate with the API server. So you don’t even need any machine, in your quote, unquote, “Kubernetes cluster”. You just need the API machinery. And then given that machinery, you can run that control loop somewhere else. So at that point, then Kubernetes can actually be helpful in almost any context, where you want a declarative control plane. As long as you’re willing to create this CRD, the API, and the thing that does the work.

So when you start to say, “Hey, let’s use Kubernetes for everything.” The truth is on one hand, it can be helpful in any of these contexts is where you want a control plane to drive some outcome. Whether that’s CI/CD. So you see it used for tools like Argo or Tekton, where you can say, “Hey, here’s a list of build steps.” And then, yeah, Kubernetes was like, “Great, give me that definition. I’m not going to do anything. I’ll just store it, secure it, and let things collaborate with this object. But other than that, there’s nothing for me to do.” And then you can build your own control loop that actually runs CI/CD jobs. That’s the beauty of Kubernetes. So this is why you’re seeing a lot of people use Kubernetes. But more appropriately, the Kubernetes API machinery to back these other projects.

Henry Suryawirawan: [01:06:00] So in that sense, looking at these trends, and especially containers has becoming more and more popular, do you see that the VM approach will be obsolete? And do you see more like containers in the future becoming more like a platforms? And they will be the infrastructure of choice for people to build their application on?

Kelsey Hightower: [01:06:18] So the funny thing about Kubernetes, Kubernetes is actually very node-centric. Even though we talk about containers, we’ve done a good job in the container community for the most part of separating the machines from our applications. So that means we typically package all of our dependencies, but we still need the kernel. What Kubernetes does is actually, in my opinion, it’s just a very thin layer on top of, for most people, virtual machines. So Kubernetes to me, makes virtual machines easier to use. Not necessarily makes them go away. Now, of course you can use Kubernetes with bare metal. But for most people, they’re using Kubernetes with virtual machines. So does that make virtual machines go away? I wouldn’t say that. I would say Kubernetes makes it easier to manage a group of resources. Typically, since we’re talking about nodes, in the form of a virtual machine. And this is not very different from like Puppet, Chef or Ansible, right? These are configuration management tools, where you can take all of your machines, and put them into some form of an inventory, and assign roles to them. And then that configuration management tool will do the work of making sure that those machines behave the way that you’ve described. Kubernetes does this slightly differently by having high level concepts, like a container, and that will be placed and run. So what it does for this step of our journey? It makes the virtual machine have to do less work. It basically says, “Hey Linux, we don’t need a package manager down there anymore, because the user’s going to be bringing all the code and dependencies that they need. We no longer need people to be logging into the machine. We no longer need to be putting a bunch of agents on this server, because now we can include some of those agents with the deployment object.” So it reduces the role a little bit, of a virtual machine.

Now serverless is slightly different. Depending on the contract of your serverless platform. For example, if you look at Lambda, since Lambda is code-based, then Lambda can do things interesting. Lambda can say, “Give me your code. Here’s the languages we support. But it’s up to me if I want to compile your deployment for Intel or ARM.” You may not even get to pick the CPU architecture, because they can compile it to something that they want to do for their own efficiency. Whereas with the container, and if you pre-compiled the binary, then I’m going to be a little bit closer to a particular machine architecture.

So this is the big battle we have. So in the serverless world, at the extreme end, is typically source code based. Because then, we can control what the underlying architecture is. If you give me a container, then I may have to expose, and make you make a decision. Is this a Windows? Is this Linux? Or is this Intel? Or is this ARM? Those are the things that kind of keep us gravitated to a particular set of machines.

Kubernetes Resources [01:08:54]

Henry Suryawirawan: [01:08:54] So for those people who haven’t been exposed to Kubernetes, what will be your advice for them to start with? Like where to learn this Kubernetes?

Kelsey Hightower: [01:09:02] Ask yourself what your real goal is. It’s okay if your goal is just to learn Kubernetes. Just to be educated in the know. And in those cases, there are lots of great Kubernetes books out there. “Kubernetes Up and Running” is going to be probably a much gentler introduction. My coauthors are Brendan Burns and Joe Beda. Those two are two of the founders, not the only founders, but two of the founders of the Kubernetes project. So all three of our voices are in this book. There’s a second edition out. And we try to take you from “Hey, here’s where you probably are now. Here’s how you build a container using things like Docker. And then try to progress you up the stack to leverage more and more Kubernetes over time.” And I encourage you that there’s other books out there as well that will take a different angle at this. And then there’s some people who’s like, “Hey, maybe I’m a past the basics. I’m done trying things out. I also want to build Kubernetes from the ground up.” And that’s where “Kubernetes the Hard Way” comes in to try to help you with all of those things.

Now, if you want to be a professional in that, you may want to look at maybe the Kubernetes certification program that the CNCF runs. So I think it’s like the Kubernetes Certified Administrator, or something. And in that world, you might say, “Look, I want to be able to prove my skills, so I can start managing this stuff at companies or take a job with it.” That’s a lot of infrastructure focus paths. Now, if you’re a developer, the way I like to think about it is, as a developer, you can start in a different direction. One is you can start with your own application. Is my application portable where it can actually be decoupled from the machine in a way that Kubernetes likes? For example, am I building my containers in the way where it can run on a wide range of kernels? That’s a good place to start. Do I have health checks? Am I leveraging some of those cloud native patterns that we talked about earlier? And if you do all of those things, then if your infrastructure team, or maybe even yourself decided to use something like Kubernetes, you’re going to have a great head start of being able to use all of the features of Kubernetes, because your application has a lot of those standard interfaces that Kubernetes can use. For example, if you expose metrics using something like Prometheus and their standard libraries, then those metrics can be used to auto-scale your application inside of Kubernetes. If you log to standard out and maybe even do some structured logging, again, Kubernetes will automatically collect those logs for you, and send them to something like, Stackdriver Logging. You have so many choices. But if I was the developer, I would probably start by making sure that my application can take advantage of all those things, and then learn more about how to do a deployment inside of Kubernetes. So again, if I was purely focused on development, maybe I don’t necessarily take a lot of time to learn how to build a cluster from scratch. But I will try to learn how to do a deployment object. How to expose my service. We have things like pod disruption budgets. And that’s more of a little bit of an advanced concept. The idea there, if you say I want three copies of my app, you can use a pod disruption budget to say, “If someone is doing maybe automatic upgrades for the cluster, I don’t want to ever have less than two of my apps running for reasons.” And that way, when someone goes to try to scale you down to one, that pod disruption configuration will be your way of saying, “Hey, you can go no lower than two”. And it will prevent the administrator from doing that. So those are a lot of application centric things that you can learn inside of Kubernetes in order to articulate how your workload should run.

3 Tech Lead Wisdom [01:12:13]

Henry Suryawirawan: [01:12:13] Thanks for sharing that. I’ll make sure that to put all these resources in the show notes. So Kelsey, it’s been a pleasure talking to you. I learned a lot about Kubernetes, cloud native, serverless, and all the things that we have discussed so far. As my last question, normally I would ask every guest that I have to share their three technical leadership wisdom. Kelsey, do you have any that you want to share with all of us here?

Kelsey Hightower: [01:12:33] Yeah. So the things I’ve learned over time is to decouple your identity from the technology. And that helps you make decisions, I think in a better way. So instead of being a Linux system administrator, because there’s going to come a time where Windows is going to be the best platform of what you’re doing. If your identity is so close to Linux and Linux only, you may not even consider anything else because you’re so locked into that. The same is true for people’s CI/CD system. There are some people that will only use Jenkins, even when something like Spinnaker might be a better fit. Or something like Spinnaker and Jenkins is a better fit. So if you decouple yourself from the low-level technologies, and you step back, and maybe you try to align yourself more with the fundamentals, I think that’s going to help you be a little bit, a little bit easier to work with, because you won’t be so dogmatic about things, but it also helps you open the door to new perspectives versus getting bogged down in your job title.

The second one would be take your time. I know everyone talks about high productivity, and going fast, and all of these things. But what I’ve learned over time was: I slow down and enjoy the path to get there. So if I’m learning a new programming language, I’m okay slowing down. Do some hello world. Try to rebuild things that I’ve done before. And then stop. And then next week, pick it up some more. And I know that maybe within a year, I’m going to be really comfortable with building the things that I want to build. That doesn’t need to happen in two weeks. It doesn’t need to happen in three months. I’m okay if it takes a little bit of time, because it is an investment for the long haul. So to me, that level of patience. So what does that mean? I don’t look for how do I learn everything I need to know in five minutes. Or show me the easy way. I’m okay saying it’s going to take a while. And then slowly build up my skills and continue to make progress.

I guess the third one, I found it easier to inspire people than to boss them around. And that’s by bringing my whole self to the job. So if you’re in a situation where you’re someone’s boss or manager, of course you can always try to tell them exactly what to do, and then maybe punish them if they don’t do it correctly or praise them if they do. That’s one way of doing things and it may work for a lot of people. But what I found most effective for me was to inspire people into action. Because when that happens, they fill in the blanks in a way that I would have never asked them to. And then they approach it typically with way more passion and energy, and they will bring their whole selves to the project. And I think inspiration. So leading by inspiration or persuasion instead of authority, even when you have authority over folks, can change the outcome in dramatic ways and allow that other person to grow. So I would think for a lot of people explore what it would take to inspire people versus forcing them to do something. And you might be happy with the results.

Henry Suryawirawan: [01:15:17] Very insightful, indeed. So thank you so much, Kelsey, for participating in the show. I really, really enjoyed this conversation. And I’m looking forward to have you in the future episodes. So thanks again, Kelsey. And good luck with your “Mesh the Hard Way”.

Kelsey Hightower: [01:15:30] Awesome. Thanks for having me.

– End –