#44 - Team Topologies - Manuel Pais

28-Jun-2021 51 mins Manuel Pais

included in Culture & Practices Leadership Team Collaboration Agile Architecture & Design Microservices

“Practices and principles are necessary and useful, but they should be informed by what the constraints are in the first place. We need to acknowledge the constraints, and then build and decide on practices and principles based on that.”

Manuel Pais is the co-author of “Team Topologies” and a DevOps thought leader, focusing on team interactions, delivery practices, and accelerating flow. In this episode, Manuel shared great insights from his book “Team Topologies”, starting from highlighting some constraints that organizations typically face, such as Conway’s Law and cognitive load. Manuel then explained the 4 fundamental team topologies and how they are addressing those constraints. Manuel also shared about the Team API concept as well the 3 core interaction modes, which inform how teams should interact with each other in order to improve the overall flow within the business. Finally, Manuel shared some advice on how leaders can start implementing these ideas within their organizations.

Listen out for:

Career Journey - [00:04:47]
Team Topologies - [00:07:00]
Challenges with Organization Chart - [00:08:58]
Measuring Flow - [00:11:54]
Conway’s Law - [00:14:57]
How to Use Conway’s Law - [00:18:10]
Breaking Monolith into Microservices - [00:21:15]
Cognitive Load - [00:23:57]
4 Fundamental Team Topologies - [00:27:33]
Team API - [00:34:55]
3 Interaction Modes - [00:37:57]
Advice to Align with Team Topologies - [00:42:41]
3 Tech Lead Wisdom - [00:48:13]

_____

Manuel Pais’s Bio
Manuel Pais is the co-author of “Team Topologies: organizing business and technology teams for fast flow”. Recognized by TechBeacon as a DevOps thought leader, Manuel is an independent IT organizational consultant and trainer, focused on team interactions, delivery practices and accelerating flow. Manuel is also a LinkedIn instructor on Continuous Delivery.

Follow Manuel:

Twitter – https://twitter.com/manupaisable
LinkedIn – https://www.linkedin.com/in/manuelpais/
Team Topologies – https://teamtopologies.com/
Team Topologies Academy – https://academy.teamtopologies.com/

Mentions & Links:

“Team Topologies” book – https://teamtopologies.com/book
“Release It!” book – https://www.goodreads.com/book/show/1069827.Release_It_
“Drive” book – https://www.danpink.com/books/drive/
Conway’s Law – https://en.wikipedia.org/wiki/Conway%27s_law
Matthew Skelton – https://www.linkedin.com/in/matthewskelton/
Michael Nygard – https://www.linkedin.com/in/mtnygard/
John Sweller – https://research.unsw.edu.au/people/emeritus-professor-john-sweller
IT Revolution – https://itrevolution.com/

Our Sponsors

Are you looking for a new cool swag?

Tech Lead Journal now offers you some swags that you can purchase online. These swags are printed on-demand based on your preference, and will be delivered safely to you all over the world where shipping is available.

Check out all the cool swags available by visiting techleadjournal.dev/shop. And don't forget to brag yourself once you receive any of those swags.

Like this episode?

Follow @techleadjournal on LinkedIn, Twitter, Instagram.

Buy me a coffee or become a patron.

Buy me a coffee

Quotes

Career Journey

When DevOps and Continuous Delivery came about 10 years ago or so, it struck a chord with me that, yes, this is very much what we need to think about: the flow of development of software, as well as the production and the live support. And we need better ways for teams to communicate with each other, understand dependencies between them. Because before that, it was very much based on specialization by skills and by competencies.
One of these [specialized] teams in isolation cannot do the work that is needed to get the actual software product or service available to customers.

Team Topologies

Many clients wanted help with some new toolset or adopting some better practices. And that’s all good and well, and it helps to some extent, but in many organizations, they’re not looking at the elephant in the room, where teams are not communicating in an effective way. Teams don’t even know sometimes who they have to ask for certain things.
You have that - what I mentioned - very strict separation of concerns and responsibilities, which means you introduce a lot of dependencies to get any sort of value to customers out the door. (Which also) means it has to go through many teams doing small things, and so that’s not conducive to fast flow.
You can have the best toolset in the market, and that’s only going to help you to some extent. When you have the combination of the flow of work in the teams, it flows well because the team has sufficient autonomy and they understand their dependencies. We avoid that sort of blocking dependencies where one team has to wait for another team to do something.
Once we achieve that sort of faster flow with more autonomy, then yes, the tools can then heighten even more this sort of fast flow, get better data, get better automation, etc. But if we just start with automation and tooling, we quickly run against challenges in terms of team structures and communications.

Challenges with Organization Chart

The org chart is not a problem per se. It’s not that we shouldn’t have org charts. They have important benefits in terms of understanding how we’re organized, especially in large organizations. It helps with reporting, and hopefully also helps with top-down alignment of strategic goals for the organization. So everyone understands how we are related and how our work relates to the goals that the organization is trying to achieve.
The problem that tends to happen is when, in a way, we give too much importance to the org chart in the sense that people start looking at it as specifying all the decision-making lines and then specifying with whom should we interact or not.
Those sort of things is where it starts to become a challenge to fast flow. Because with the complexity of modern software delivery and operations, you can do that but you won’t achieve fast flow. Because there’re too many steps and too much redirection - in a way - through the hierarchy to get to the actual work getting done.
We want to favor for fast flow. You want to favor local decisions, and having teams interact with the right teams at the right time. And so when you introduce those sorts of blockers with the hierarchy and decision-making having to be top down, then we’re not allowing the people who are doing the work and closer to the customer to actually make the best decisions.
That’s where really the org chart can become a problem, where people see it as imposing the communication lines and imposing decision-making lines between teams. That’s the key issue there where it becomes an obstacle to fast flow, if we misinterpret the org chart as dictating all these things.
It is also coupled with the problem that in some organizations, the worth of the managers (and) senior managers are linked to how many people report to them. And that kind of feeds into this narrative that you need to go up the hierarchy for any kind of decision-making so that we can show the value of senior managers.
With team topologies, this sort of approach is being replaced or moved away from that to how do we enable the people and the teams who are under our hierarchy to actually be able to make better decisions more autonomously and not depend on the hierarchy.

Measuring Flow

There’s a very straightforward metric called flow efficiency, which is essentially looking at how long it takes from the moment we have an idea for a new feature or a new service until it’s actually available to the customer. So the full lead time from idea to production. And from that overall elapsed time, how much have we actually spent working on this feature or service versus how much was wait time.
In many organizations, the actual work time is about 15% or less. That means out of a hundred days that it takes to get a feature out, 15 days were actually workdays, and 85 days were waiting for someone else or another team to have availability, or someone to approve or make a decision.
You can then look at the flow efficiency at different levels. You can look within the cycle time, since we start coding until we deploy, or you can look at the broader picture of since we started or since we approved this idea until it actually starts to be used by customers.
And then obviously also value stream mapping. So that’s the technique for actually looking at the overall process, and looking at dependencies between teams, looking at where there are cues where work piles up for some teams.

Conway’s Law

I like to see it as a constraint, not necessarily anti-pattern, just something that we should be aware, especially in software driven organizations.
It’s from a paper by Mel Conway which was published in 1968, so it’s been around for a long time. But it got more traction with the rise of microservices, and people started seeing in practice how this is actually happening.
One of the corollaries in that paper that became known as Conway’s Law says that the structures that your organization has and the communication paths between teams strongly influence the system architecture that we can achieve.
- That means we might have a wonderful architecture that we’ve designed and that’s what we’re trying to build for customers.
- But then if the team structures are very different, and we don’t have the necessary communication paths in place to achieve that system architecture, either we will end up with something quite different on the system architecture or it’s going to cost us a lot more.
- There’s going to be a lot more challenges because of poor communication, responsibilities that are not aligned to the responsibilities that we expect in different parts of the system, and so on.
With microservices, people started to see these more, because, unfortunately, in many places they thought, “Well, yeah, this is the right architecture, it’s going to allow us to go faster.” But then they forgot the alignment of the team structures and communication paths, and so they ended up still needing any kind of non-trivial change or feature to require coordination between multiple teams, when with microservices, the expectation was that each team will be able to evolve more independently.
We can apply what people call reverse Conway maneuver, where it means we can design our kind of ideal system architecture that we think is going to be a better fit for the requirements of the customers and also the non-functional requirements. And then let’s look at our team structures and communications and maybe adapt them to be more aligned with the system architecture.
There’s also a great quote from Michael Nygard where he says “when you ship a product, you’re also shipping the organization structures with it.” So basically, he’s talking about Conway’s Law, and the fact that you get this mirroring effect in the system of the actual structures you have in the organization.
Another corollary from that paper from Mel Conway was that, if you have a very rigid organization structure, if that’s [the org chart] very static and rigid and it doesn’t change easily even at the team level, then we’re effectively constraining the solutions we’re able to find for our systems. There might be a range of solutions that we don’t even think about just because of the way teams are set up. Especially if they’re very rigid and hard to change.

How to Use Conway’s Law

There are many decisions being made in the organization outside IT that - actually, because of Conway’s Law - can end up having an influence on the system architecture.
Even if you think about now that we’re in this remote or hybrid world, the way you set up your communication tools, your Slack or Microsoft Teams, or what have you, might actually have some influence in the way that teams communicate.
This decision doesn’t start only at the system architecture level, especially if we want to have teams that are more autonomous, that can deliver, and have more ownership over a slice of a larger product, which tends to be the case in many organizations that have larger products or services. Because teams have a limited number of team members, then we need to break down this larger product into smaller chunks.
Organizations that grew over time and because they were successful, and they delivered useful products and services but things started to get a little bit messy, and we don’t have clarity anymore on what exactly are the different sort of business lines or the different value that we provide to customers. It becomes blurred and you get this software monolith, but also monoliths on the business model side at the same time. Because we just built on top of what we had before, and now it’s very hard also to decouple the teams.
When we want to do that [decouple the teams], we need to start with the business model. Understand, “Okay, what are the different value streams going back to that as well? What are the business value streams? What are the things that customers pay for or are interested in that we provide?”, so that we can then have better understanding what the independent lines of business are that we can then align teams, and have those teams deliver value more independently.

Breaking Monolith into Microservices

Before we go to that sort of level of granularity, if you’re looking at the monolith, what we talk about in the book are fracture planes. Look at the ways that we can split the monolith that are effective and that will allow teams to align to these different smaller pieces.
There are different fracture planes you can think about. Obviously what we just discussed before about the business value streams, that would be the main way that we would split the monolith. In the language of Domain-Driven Design, this would be your sort of bounded context that you understand (as) more or less independent from other contexts of business.
But then there are other ways, so we might even split monolith to some extent based on the location of teams or on the time zones of teams.
Other ideas you might have: Some performance or regulatory requirements on some parts of the system, which means it makes sense that this is split from the rest and assigned to one team or a few teams to take care of.
Usually, you will need a combination of fracture planes, especially the larger the monolith (is), to actually make sense of that.
The other aspect to consider is the cognitive load on teams that we talk about in the book. We also want to split and be mindful of what the capacity of a single team is. Do we need to split further some larger piece of the system so that the smaller pieces fit the capacity of the different teams? Or does this look like a reasonable size for a single team to own end to end? And we’re still talking about hopefully vertical slices of the original monolith.

Cognitive Load

Cognitive load is another sort of constraint on how much we can achieve, and how we can get teams to be high performing.
If we don’t like Conway’s Law, if we’re not aware of the limits of cognitive load or cognitive capacity on teams, then we’re probably going to not be able to achieve higher performance.
Cognitive load theory is actually based on individuals, and it’s how much of our working memory is being used at the moment in time.
What we did with team topologies is actually understand this applies at the team level as well. It’s not a scientifically defined term, if you like. It’s actually research going on by John Sweller and others on group cognitive load. So it’s actually an emerging area.
What we identified is that there is a limit to the capacity of the team. Again, if the team has too many responsibilities or is responsible for too large size of the system, that they’re not able to fully understand and grasp how this code works. How does this relate to other parts of the system? Because it’s too much for our capacity or we have too many responsibilities, and basically, we’re always running around trying to just respond to requests in a sort of firefighting mode and context switching all the time. That’s not going to be conducive to better performance, more autonomy, and more ownership in those teams.
Cognitive load also can be split in different types.
- The things we want to maximize have to do with understanding the business, understanding the customers, understanding obviously, the system, the code itself, how it works and how we improve it. All those are more related to what is called the Germane cognitive load. Everything related to the solution, to the problem, and the solution space.
- There are other types of cognitive load, like extraneous and intrinsic. Extraneous is related to all the tasks that we need to do to deliver our work. How do I deploy my application? How do I run the tests? How do I access the test database? All these things that need to happen, but are not directly related to the problem and the value we’re providing to the customers.
That’s why we introduced the 4 types of teams and the 3 core interaction modes, which are meant to provide an ecosystem of teams where you have the teams focused on the services - what we called stream aligned teams, aligned to the business value streams - (and) how do we minimize their cognitive load by providing useful services in a platform.
If that can be provided by platform services that are focused on the experience, on minimizing the effort for the streamlined teams to understand how to do these things with good abstractions and good developer experience, then we help them minimize cognitive load, the extraneous kind. So they have more mental space to understand the business as well as the code itself so they can better evolve and support it.

4 Fundamental Team Topologies

The stream-aligned teams: this would be teams with end-to-end ownership. Obviously there are other terms that are similar, like product teams, cross-functional teams.
- We call them stream-aligned teams because we thought that will provide a more specific definition. Because sometimes you have streams of work that are not necessarily a product or are parts of a larger product.
- Ideally, in organization you mostly have stream aligned teams that are providing value to customers and have end to end ownership. But because of cognitive load, that means this might introduce a lot of demand on this team because we’re saying, “Well, you need to understand the customer problems. You need to understand then how to build a solution with software. You need to understand how to deploy, how to run the solution.”
To minimize that sort of cognitive load, we then have the platform teams that are focused on providing.
- The stream aligned teams are the customers (and) they’re [platform teams] focused on providing value to our internal teams but also in this product focused way where we need to understand our own internal customers, and what do they really need.
- So we don’t go off and build everything that we think they need. But actually, we talk to them. We get quick prototypes. We get fast validation of “Is this what’s going to really help you?”
Another one of them is enabling teams.
- These teams don’t usually build any service. They’re sort of experts in some domain. What they do is help the stream-aligned teams in particular, but maybe also platform teams increase their awareness, and their understanding around different domains. Then this might be more technical understanding about test automation, for example, monitoring, or it can be more kind of product domains, understanding more about user experience or about regulations perhaps in certain industries.
- These teams are usually a small team of experts in the enabling team that are going and helping up-skill the stream-aligned teams, so that they have the necessary knowledge to do their work more autonomously. Not that we need everyone to become an expert in these domains, but at least have a sort of working knowledge that we can do the common tasks in the life cycle.
And finally, we have complicated subsystem teams, which are optional. We realize in some cases, you do need these teams, again, because of cognitive load.
- In those cases, it makes sense to have a complicated subsystem team where, say, there’s a part of a larger service that really requires very specialized knowledge, PhD type of knowledge. Usually not technology specialization, although in some particular cases might be the case, where this team exists because they are helping to reduce cognitive load on the stream-aligned team.
- But in general, we find this should only be needed in very particular cases. So most organizations probably shouldn’t have any complicated subsystem teams, and in some cases they might need one or two maximum.
Any organization will be in a better place if they have the thinking around these types of teams.
If we’re a small startup, we can’t have all these different teams. So if you start from that sort of extreme, then yes, you won’t be able to have stream teams, plus platform team, plus enabling, but you can have the thinking.
The platform, maybe it’s just a Wiki page that helps teams understand how do we use that. You can define a platform as just a Wiki page, helping guide teams, and basically building in the shared knowledge of what works and what doesn’t, even if we’re a startup.
The same for enabling teams, understanding that, first of all, we know that technology is always evolving, and new practices are coming up.
To be aware of how do we help teams evolve in a way that doesn’t depend on their free time, or depend on people having the willingness to learn outside of work; we shouldn’t expect that from people. We should find ways in enabling teams or they need this ’enabling thinking’ as a way of allowing teams to grow and to gain new capabilities over time.
You might not have a team, you might have a couple of people, for example, in a startup who have been there longer, maybe are more senior. And so maybe they dedicate part of their time to facilitate knowledge to others. Or maybe even between more senior team and more junior team, if the startup is growing, where one team is helping facilitate the other. They’re not fully dedicated, but at least you start having this ’enabling thinking'.
At some sort of size, it starts to make sense that you have platform teams because if you have many stream-aligned teams, you start to want to have ways to embed good practices in the platform while considering also the specific needs of different teams.
The thinking of why we need this type of approach is what really matters at any scale.

Team API

APIs, Application Programming Interface is defining the way that you interact with some system or service through this API. Basically, it’s an interface. So what we’re saying is the team APIs interface to the team.
The objective of the team API is for a team to clarify to other teams how do we work? How do we like to communicate and interact with other teams? What are the practices that we follow? Also, what is our sort of roadmap? What are we working on? What’s coming next? It’s very much focused on what other teams need to know about us. What’s going to be helpful for other teams to interact and understand what we do?
It’s a bit different from, for example, team working agreement that tends to be internal to the team, where we say, “Okay, this is how we work together inside the team.”
The team API is actually making it easier for others to interface with us as a team. So that we have more clarity and less ambiguity on how other teams should interact with us.
It’s really thinking about the other team’s perspective. So it’s a little bit of having that empathy as well, to understand if other teams might be frustrated in terms of their interactions with us, or they don’t understand what we do.
The team API should not be a static artifact. You should evolve it over time as you realize that some problems or awkward interactions have happened with other teams.
The team API is a sort of single entry point to our team. That’s where we should then make it clear, “This sort of questions, look at this document and you’ll find the answer probably, and if not, contact us.” So we make that sort of communication easier, and we also reduce the overload on our team, on some things that maybe we didn’t expect to get questions.
You can’t expect other teams to know where all your documentation is or where all of your practices are. It’s [Team API] really providing this single entry point to help other teams interact with us.

3 Interaction Modes

Besides the 4 fundamental types of teams we talked about, then we have the 3 core interaction modes which help these teams understand, what are some useful ways for us to interact, what are the expected behaviors from us as a team when we’re doing this interaction with other teams. In many organizations, there is this sort of a naive expectation that, well, we just collaborate whenever we need. But that’s very loosely defined.
First (is) collaboration, but in a well-defined way.
- We’re talking about two teams working together for a period of time to achieve a specific outcome. So the more specific this outcome is, the better we’ll be able to identify if we’ve achieved it or not.
- And we also set expectation on how long we expect this to last. So it’s not an open-ended collaboration, which can lead to actually more of a relationship and dependency between teams. And it’s actually a blocking dependency. We cannot do anything unless this other team has the time to help us.
Then we have facilitating as another core interaction mode, especially for enabling teams. They are facilitating knowledge for others.
- So typically you’re not actually building anything or working on some service where you’re maybe pairing, or running some workshops, or helping teams understand. Improve their knowledge around some aspects of either the business or the technical side or practices that we use in the organization.
- Again, it should be framed in terms of what’s the expected duration? What do we want to achieve? What should you know after we’ve facilitated for this period of time?
And finally, we have X-as-a-service.
- That’s very much based on things like infrastructure-as-a-service or software-as-a-service, where especially for the platform, we’re saying at some point we want to have services in the platform that are mature enough and stable, and provide a good developer experience with the right documentation, right level of reliability so that teams can consume without actual interaction.
- It’s the lack of interaction because we have this service in a way that is easy enough to understand (and) to consume independently. So have one team providing a service and then one or more teams consuming the service.
We shouldn’t expect these interactions along these 3 modes to always go perfectly and smoothly. There will be issues and situations where we thought, “This was going to take two weeks; it took two months.” Those are great opportunities to reflect on why it took so long, (or if) there (was) something that wasn’t clear.
We’re not saying that it’s all going to go nice and smooth, but it provides a better framing to learn and then understand when some interaction goes wrong or awkward, (and) how we (can) learn from that and course correct.

Advice to Align with Team Topologies

Start by acknowledging these constraints that we talk about, like Conway’s Law, cognitive load, (and) also trust boundaries.
- Often we’re focused on what the good practices or best practices (are) and what’s the best way to do things. Practices and principles are obviously necessary and useful, but they should be informed by what the constraints in the first place (are). Constraints are usually things we cannot really change.
- Limits on cognitive load, limits on trust between groups of teams, Conway’s Law, (are) things we cannot really change. So we need to acknowledge them, and then build and decide on practices and principles based on that. Because if we don’t, then we might be fighting these constraints rather than understanding and leveraging them in our advantage.
The other thing with team topologies: we’re not just clarifying the ways that teams can interact and their mission of different types of teams. But also we’re hopefully helping teams become more motivated with feeling more autonomous, that they have more ownership of their service or the things they provide, and that they’re becoming more competent.
- We think team topologies also help teams become more autonomous and have more ownership. And so the role of the engineering manager ideally starts to move away from managing and then making decisions for the team to actually making less decisions, let the teams have more local decision-making.
- If you’re getting out of the way of the teams, providing them what they need to become more autonomous in terms of skills, competences, and support, then you can look more into how do we help - even if we’re a manager of one team - this team understand how to deal with other teams in a more productive way, (or) remove blocking dependencies on other teams.
One last thing is (to) also start thinking about alignment of purpose between the team and individuals.
- With the fundamental types of teams, we now have the ability to be more clear on what the mission of this team - of a stream-aligned team - is different from enabling team or platform team. We have more clarity: Why does our team exist? What are we trying to achieve? Who are our customers?
- But then, there’s also the individual purpose. Every person has their own individual goals and individual motivation. And often, we don’t consider the two together.
- People have their own goals, and what they want to achieve. We can surface those conversations, understand someone who is really focused on the technical side, understanding the technology, and keeping up to date on that, maybe is not a great fit for a stream-aligned team where you want more T-shaped or more generalist people that are actually more focused on end-to-end delivery. So maybe that person is a better fit for a platform or even perhaps enabling team.
- And so helping understand what are the individual purpose, and how they align to which types of teams. Often, it’s a mixed where you need people to align their individual purpose with the team, but also be willing to learn and improve on some areas that they don’t have as much experience yet.
In short, understand your constraints. First, get out of the way of teams as much as possible, let them make more decisions on their own, increase their autonomy and ownership. Help teams deal with dependencies between them, especially blocking dependencies. Help them navigate those, and hopefully minimize those so we can have faster flow. And finally, help align individual purpose and team purpose.

3 Tech Lead Wisdom

Understanding constraints like Conway’s Law, cognitive load, trust boundaries.
- Understanding your value streams, understanding where’s the wait time, where are the dependencies, the handovers of work between teams that we might need to deal with.
Look for ways to improve the autonomy and the ownership of stream aligned teams and other types of teams.
- How do we increase the skills of our stream-aligned teams with enabling aspects, and also reduce their cognitive load with the platform.
Help teams navigate the ecosystem and navigate and reduce blocking dependencies between them.

Transcript

Episode Introduction [00:01:06]

Henry Suryawirawan: [00:01:06] Hello to all of you, my listeners. It’s great to be back here again with another new episode of the Tech Lead Journal podcast. Thanks for spending your time with me today listening to this episode. If you haven’t, please subscribe to Tech Lead Journal on your favorite podcast apps and also follow Tech Lead Journal social media channels on LinkedIn, Twitter, and Instagram. And you could also make some contribution to the show and support the creation of this podcast by subscribing as a patron at techleadjournal.dev/patron, and help me towards producing great content every week.

For today’s episode, I am very happy to share my conversation with Manuel Pais. Manuel is the co-author of the Team Topologies book, a DevOps thought leader, and an independent IT organizational consultant focusing on team interactions, delivery practices, and accelerating flow. Effective software teams are essential for any organization to deliver value continuously and sustainably. But how do you actually build the best team organization for your specific goals, culture, and needs? In the book Team Topologies, Manuel and his co-author Matthew Skelton share secrets of successful team patterns and interactions to help IT organizations choose and evolve the right team patterns to ensure success, making sure to keep the software healthy and to optimize for value streams.

In this episode, Manuel shared great insights from his book Team Topologies, starting from highlighting some constraints that organizations typically face, such as Conway’s Law and cognitive load. Manuel then explained the 4 fundamental team typologies and how they can help to address those constraints. He also shared about the team API concept, as well as the 3 core interaction modes, which inform how teams should interact with each other in order to improve the overall flow within the business. Finally, Manuel shared some advice on how we can all start implementing these ideas within our organizations.

I personally really enjoyed this conversation and I hope you will enjoy this episode, too. Consider helping the show by leaving it a rating, review, or comment on your podcast app and social media channels. Those reviews and comments are one of the best ways to help me get this podcast to reach more listeners. And hopefully they can also benefit from all the contents in this podcast. So let’s get this episode started right after our sponsor message.

Introduction [00:04:03]

Henry Suryawirawan: [00:04:03] Hey, everyone. Good to see you again. Today we have a new episode of the Tech Lead Journal. I have someone who I admired as the co-author of a book that has a rave review called “Team Topologies: Organizing Business and Technology Teams for Fast Flow.” So if you’ve ever heard about the book or maybe already read the book, you know what kind of things that I’m talking about. This person is Manuel Pais. He’s actually one of the co-author, and I’m really looking forward today to learn more about team topologies. How we actually should structure our team, or maybe some of the anti-patterns that we should avoid when organizing team, and also within a company. So thanks again, Manuel, for agreeing to this conversation. So looking forward to have a chat with you.

Manuel Pais: [00:04:45] Thanks for having me. Great to be here.

Career Journey [00:04:47]

Henry Suryawirawan: [00:04:47] So Manuel, before we start talking about team topologies and all that. Maybe for you to introduce yourself, maybe telling about your career background and highlights and turning points.

Manuel Pais: [00:04:56] Sure. So I have a background in Computer Science, as does my co-author Matthew Skelton. I’ve had a number of different roles from developer to tester, build engineer, team lead, QA, etc. That kind of gave me a broader perspective within the software delivery environment of different roles, dependencies between different teams, different skills that are needed. Especially when DevOps and Continuous Delivery kind of came about 10 years ago or so, it struck a chord with me that, yes, this is very much what we need to think about the flow of development of software, as well as the production and the live support. And we need better ways for teams to communicate to each other, understand dependencies between them. Because before that, it was very much based on specializing by skills and by competencies. Each team trying to be as specialized as possible. We’ve seen that doesn’t work, especially if we don’t think about the ecosystem of teams, right? One of these teams in isolation cannot do the work that is needed to get the actual software product or service available to customers. So basically, I embraced DevOps and Continuous Delivery. I became lead editor for InfoQ around DevOps as well. It’s been a great journey of meeting really amazing people with great insights.

Since 2015, then I started doing consulting, and so working with different clients around the world, understanding what are the obstacles for them to basically accelerate, being able to respond faster to the customers, and also to changes in market, and adapting to new situations. Obviously, it’s always that mix of technology, people and processes, but with team topologies, in particular, we’re trying to both raise awareness of certain constraints in certain kind of human focused aspects that are important to understand for organizations to be able to accelerate or become high performing. As well as providing some tools and some ways of thinking that can help guide these organizations to a better operating model, I feel like that is more team focused.

Team Topologies [00:07:00]

Henry Suryawirawan: [00:07:00] So maybe let’s go into the topic itself about team topologies. So in the first place, when you wrote this book, what exactly the core message or principle, or even the problems that you were foreseeing during that time that led you into writing this book?

Manuel Pais: [00:07:15] We did a lot of consulting around DevOps and Continuous Delivery. Many clients wanted help with some new toolset, adopting some better practices. And that’s all good and well, and it helps to some extent, but in many organizations, they’re not looking at the elephant in the room, where teams are not communicating in an effective way. Teams don’t even know sometimes who they have to ask for certain things. You have that, what I mentioned, very strict separation of concerns and responsibilities, which means you introduce a lot of dependencies to get any sort of value to customers out the door. Means it has to go through many teams doing small things, and so that’s not conducive to fast flow. You can have the best toolset in the market, and that’s only going to help you to some extent. When you have the combination of the flow of work in the teams, it flows well because the team has sufficient autonomy, and they understand their dependencies. We avoid that sort of blocking dependencies where one team has to wait for another team to do something.

Once we achieve that sort of faster flow with more autonomy, then yes, the tools can then heighten even more this sort of fast flow, get better data, get better automation, etc. But if we just start with automation and tooling, we quickly run against challenges in terms of team structures and communications. So that’s what we were seeing recurrently with clients. We thought at some point, we had seen enough patterns that work, and patterns that didn’t work. Also, a lot of learning from the DevOps Enterprise Summit, this conference that happens every year, from the publishers of the book as well, IT Revolution. So a lot of stories that were being shared by different organizations on some of the patterns that work and what didn’t work. So we sort of brought all that together into the book.

Challenges with Org Chart [00:08:58]

Henry Suryawirawan: [00:08:58] So I think what I hear you say is that yeah, a lot of teams these days, they will definitely start with tools and technologies probably, right? Thinking about how to implement those, but actually without looking necessarily to what the team structure or even the org structure at that point in time within the company. You mentioned interestingly in the book, actually the first problem is with the org chart itself. Why do you think typically org chart is maybe the first problem where all this maybe barrier to fast flow, maybe barrier into dependencies, blockage, and things like that? So maybe you can explain a little bit around that.

Manuel Pais: [00:09:30] Sure. So, let me say the org chart is not problem per se. It’s not that we shouldn’t have org charts. They have important benefits in terms of understanding how we’re organized, especially in large organizations. It helps with reporting, and hopefully also helps with kind of top-down alignment of strategic goals for the organization. So everyone understands how are we related and how does our work relates to the goals that the organization is trying to achieve. So that’s all fine. The problem that tends to happen is when in a way, we give too much importance to the org chart in the sense that people start looking at it as specifying all the decision-making lines, and then specifying with whom should we interact or not. If this team is in another department, then we shouldn’t talk to them directly. We need to go up the hierarchy, and then down the hierarchy to get the team. Those sorts of things is where it starts to become a challenge to fast flow. Because with the complexity of modern software delivery and operations, you can do that, but you won’t achieve fast flow. Because there’re too many steps and too much redirection in a way, through the hierarchy to get to the actual work getting done. So we want to favor for fast flow. You want to favor local decisions, and having teams interact with the right teams at the right time. And so when you introduce those sort of blockers with the hierarchy and decision-making having to be top down, then we’re not allowing the people who are doing the work and closer to the customer to actually make the best decisions.

That’s where really the org chart can become a problem, where people see it as imposing the communication lines, and imposing decision-making lines between teams. That’s the key issue there where it becomes an obstacle to fast flow, if we misinterpret the org chart as dictating all these things. Unfortunately, in many organizations that tends to happen. It also is coupled with the problem that in some organizations, the worth of the managers, senior managers is linked to how many people report to them. And that kind of feeds into this narrative that you need to go up the hierarchy for any kind of decision-making, so that we can show prove that the value of the senior managers. Hopefully, with team topologies, this sort of approach is being replaced, or move away from that to more how do we enable the people and the teams who are under our hierarchy to actually be able to make better decisions more autonomously, and not depend on the hierarchy.

Measuring Flow [00:11:54]

Henry Suryawirawan: [00:11:54] So if I get the sense that fast flow is actually, of course, it’s like the main objective that we want to achieve. But let’s say if we are currently working in a team or in a company, how do we actually identify whether we have a good flow, fast flow, slow flow? Is there any exercise? Is there any guidance on how to actually find out where we are at this point in time?

Manuel Pais: [00:12:14] Yeah. So a couple of things. Well, first, there’s a very straightforward metric called flow efficiency, which is essentially looking at how long it takes from the moment we have an idea for a new feature or a new service until it’s actually available to the customer. So the full kind of lead time from idea to production. And from that overall elapsed time, how much have we actually spent working on this feature or service versus how much was wait time? So it turns out that in many organizations, the actual work time is about 15% or less. So that means out of a hundred days, maybe that it takes to get a feature out, 15 days were actually workdays, and 85 days were waiting for someone else or another team to have availability, or someone to approve or make a decision. When you look at that metric, which in a way is very simple, then you can sort of like starting to unfold the flow problems, right? You can start looking at, okay, if we spent all this wait time, where exactly did we spend this wait time? We spend maybe a week or two weeks waiting for some infrastructure to be available, or we waited one week for some approval. So we start digging into that, and you start seeing where are the wait times that we want to reduce if we want to achieve faster delivery. So that’s one thing.

Also, the nice thing about that is you can then look at the flow efficiency at different levels. You can look within the cycle time, since we start coding until we deploy, or you can look at the broader picture of since we started, since we approved this idea until it actually start to be used by customers. So you can start wherever it makes more sense. Sometimes teams don’t have a global view of all the steps in the process, and they might want to look at their own domain of responsibility and understand where are we kind of slowing down. But obviously, at some point you want to have a more system view. You want to see. Because a lot of people say that about agile. Some organizations have successfully sort of adopted agile principles, they’ve accelerated the speed of the agile teams. But when you look at the big picture of how long it takes between an idea actually being approved then going to production, maybe there are a lot more teams involved. Maybe you need financial approval, you need product team, product owner, or product manager approval. A lot of different steps might be required to actually get until the agile team starts working on this. So, yeah, flow efficiency, and then obviously also value stream mapping. So that’s the technique for actually looking at the overall process, and looking at dependencies between teams. Looking at where there are cues where work piles up for some teams. That’s more kind of visualizing this whole process. But in terms of metric, then the flow efficiencies is quite helpful.

Conway’s Law [00:14:57]

Henry Suryawirawan: [00:14:57] So I think also another anti-pattern that normally I heard a lot, and it is mentioned a lot of times in the book as well, which is about Conway’s Law. Maybe can you explain a little bit on that? What do you mean by Conway’s Law, for people who are not familiar with it?

Manuel Pais: [00:15:11] Yeah. So I like to see it as a constraint, not necessarily anti-pattern, just something that we should be aware, especially in software driven organizations, is Conway’s Law. That means, it’s from a paper by Mel Conway which was published in 1968. So it’s been around for a long time. But it got more traction with the rise of microservices, and people started seeing in practice how this is actually happening. Basically, it says, one of the corollaries in that paper that became known as Conway’s Law says that the structures that your organization has and the communication paths between teams strongly influences the system architecture that we can achieve. So that means we might have wonderful architecture that we’ve designed, and that’s what we’re trying to build for customers, but then if the team structures are very different, and we don’t have the necessary communication paths in place to achieve that system architecture. Either we will end up with something quite different on the system architecture, or it’s going to cost us a lot more. There’s going to be a lot more challenges because of poor communication, responsibilities that are not aligned to the responsibilities that we expect in different parts of the system, and so on. So we should see this as a constraint.

It has been validated across different industries, Conway’s Law. And like I said, with microservices, people started to see these more, because unfortunately, in many places they thought, well, yeah, this is the right architecture. It’s going to allow us to go faster. But then they forgot the alignment of the team structures and communication paths, and so they ended up with still needing any kind of non-trivial change or feature to require coordination between multiple teams. When with microservices, the expectation was that each team will be able to evolve more independently. And so that’s the idea of Conway’s Law. So if we are aware of that, we can then make better decisions. We can apply what people call reverse Conway maneuver, where it means, okay, we can design our kind of ideal system architecture that we think is going to be a better fit for the requirements of the customers and also the non-functional requirements. And then let’s look at our team structures and communications, and maybe adapt them to be more aligned with the system architecture.

There’s also a great quote from Michael Nygard, who wrote the book “Release It!”, where he says, when you ship a product, you’re also shipping the organization structures with it. So basically, he’s talking about Conway’s Law, and the fact that you get this mirroring effect in the system of the actual structures you have in the organization. Just to add to that, another corollary from that paper from Mel Conway was that if you have a very rigid organization structure, and we go back to your previous question about organization chart, if that’s very static and rigid and it doesn’t change easily even at the team level, that means we’re effectively constraining the kind of solutions we’re able to find for our systems. There might be a range of solutions that we don’t even think about just because of the way teams are set up. Especially if they’re very rigid, and it’s hard to change.

How to use Conway’s Law [00:18:10]

Henry Suryawirawan: [00:18:10] So one related question that I have, does it necessarily have to start from the system architecture or the ideal architecture that we have? Because in a company or business, there are so many other departments. For example, from finance, from business, from maybe product, does it always necessarily have to start from product architecture, like maybe from technology point of view? Or do you actually also have alignment with the product team first? Like, okay, based on how the business do the business model, and also aligns with the strategy with the technology team on it. Then you come up with a team strategy, maybe a little bit clarification here.

Manuel Pais: [00:18:41] Yeah, that’s a really good question. So two things first, let me say, there are many decisions being made in the organization outside IT that actually, because of Conway’s Law, can end up having an influence on the system architecture. If team composition and team structures are decided, for example, by HR and there’s no sort of input from the technical side on how does this impact the work we’re able to do on the system and the software? Then HR to some extent is making decisions on that architecture, because we’re limiting the possibilities that we can achieve because of Conway’s Law. So that’s definitely impact from other parts of the organization in terms of what we can achieve. Even if you think about now that we’re in this remote or hybrid world, the way you set up your communication tools, your Slack or Microsoft Teams, or what have you, might actually have some influence in the way that teams communicate. Because if imagine a team that depends on another team that’s in another Slack that I don’t have easy way to communicate with them, that’s actually going to restrict our capacity to decide together based on our dependencies.

The other thing I wanted to mention, I think you’re absolutely right that this decision doesn’t start only at the system architectural level. So, especially if we want to have teams that are more autonomous, that can deliver, and have more ownership over a slice of a larger product, which tends to be the case in many organizations that we have larger products or services. Because teams have limited number of team members, then we need to break down this larger product into smaller chunks.

And that’s another problem we’ve seen often is that even on the business model side, if you like, there is sometimes monolith. Organizations that grew over time and because they were successful, and they delivered useful products and services but things started to get a little bit messy, and we don’t have clarity anymore on what exactly are the different sort of business lines or the different value that we provide to customers. It becomes blurred and you get this software monolith, yes. But also monoliths on the business model side at the same time. Because we just built on top of what we had before, and now it’s very hard also to decouple the teams. And so when we want to do that, we need to start with actually, the business model. Understand, okay, what are the different value streams going back to that as well? What are the business value streams? What are the things that customers pay for? Or are interested to that we provide so that we can then have better understanding what are the independent lines of business that we can then align teams, and have those teams deliver value more independently. So yes, it basically starts there.

Breaking Monolith Into Microservices [00:21:15]

Henry Suryawirawan: [00:21:15] Speaking about monolith, since you mentioned about it, many teams probably currently work with the monolith system, supporting multi lines of businesses and business models. And of course, the biggest topic everyone wants to talk about is how to break monolith into microservices. Is there any good practice from the team topologies point of view, how to actually conduct this exercise?

Manuel Pais: [00:21:36] Yeah. So we tend to recommend that you take a step back before going into microservices. It can be very helpful. But before we go to that sort of level of granularity, if you’re looking at the monolith, what we talk about in the book are fracture planes. Look at the ways that we can split the monolith that are effective and that will allow teams to align to these different smaller pieces. Microservices tend to be, from a more technical perspective, how do we split this into two different parts? But actually there are different fracture planes you can think about. Obviously what we just discussed before about the business value streams, that would be the main kind of way that we would split the monolith. In the language of Domain-Driven Design, this would be your sort of bounded context that you understand this is more or less independent from other contexts of business. But then there are other ways, so we might even be splitting monolith to some extent based on the location of teams, on the time zones of teams. Because if you have two teams that almost have no overlap in terms of their time zone or the working hours, because of their time zone then it’s probably a good idea that they’re working on very different parts of the same system, so they’re not working on things that might cause dependencies between them, because it will be very hard to communicate between those two teams.

And we talk about other ideas you might have. Some performance or regulatory requirements on some parts of the system, which means it makes sense that this is split from the rest, and assigned to one team or a few teams to take care of. So usually, you will need a combination of fracture planes. Especially the larger the monolith to actually make sense of that. And so I think that’s the sort of slightly higher level perspective to start with the microservices. You do this more coarse-grained split of the monolith, and then you might look into more fine-grained split with microservices. I think the other aspect to consider is the cognitive load on teams that we talk about in the book. We also want to split and be mindful of what is the capacity of a single team. Do we need to split further some larger piece of the system, so that actually the smaller pieces fit the capacity of the different teams? Or does this look like it’s a reasonable size for a single team to own end to end? And we’re still talking about hopefully vertical slices of the original monolith.

Cognitive Load [00:23:57]

Henry Suryawirawan: [00:23:58] So this cognitive load seems to be like a key concept within team topologies. Maybe can you share a little bit, what do you mean by cognitive load? And how do we actually measure whether the current team really has a good enough cognitive load capacity? Or is it like too much of it? Or maybe even less than it? So maybe can you explain what is cognitive load in this sense?

Manuel Pais: [00:24:18] Yeah. So, cognitive load is another sort of constraint on how much we can achieve, and how can we get teams to be high performing. If we don’t like Conway’s Law, if we’re not aware of the limits of cognitive load or cognitive capacity on teams, then we’re probably going to not be able to achieve higher performance. So cognitive load theory is actually based on individuals, and it’s how much of our working memory is being used at the moment in time. So that’s for individuals. So this comes from field of psychology. This was coined by John Sweller. But what we did with team topologies is actually understand this applies at the team level as well. It’s not a scientifically defined term, if you like, it’s actually research going on by John Sweller and others on group cognitive load. So it’s actually an emerging area. What we identified is that there is a limit to the capacity of the team. Again, if the team has too many responsibilities or responsible for too large size of the system, that they’re not able to fully understand and grasp how this code works. How this relates to other parts of the system? Because it’s too much for our capacity or we have too many responsibilities, and basically, we’re always running around trying to just respond to requests in a sort of firefighting mode, and context switching all the time. That’s not going to be conducive to better performance, more autonomy, and more ownership in those teams.

Then cognitive load also can be split in different types. So we start to understand better. Yes, the overall idea that teams have a finite capacity of what they can understand and be comfortable with. But also there are different types of cognitive load. The things we want to maximize have to do with understanding the business, understanding the customers, understanding obviously, the system, the code itself, how it works and how we improve it. All those are more related to what is called the Germane cognitive load. Everything related to the solution, to the problem and the solution space. But there are other types of cognitive load like extraneous and intrinsic. Extraneous is related to all the tasks that we need to do to deliver our work. How do I deploy my application? How do I run the tests? How do I access the test database? All these things that need to happen, but are not directly related to the problem, and the value we’re providing to the customers. And so that’s where we should focus in terms of minimizing that cognitive load.

That’s why we introduced the four types of teams, and the three core interaction modes, which are meant to provide a sort of ecosystem of teams where you have the teams focused on the services. What we call stream aligned teams, aligned to the business value streams. How do we minimize their cognitive load by providing useful services in a platform, for example, abstractions that help these teams go faster, and deploy without worrying about all the details, or monitor our service without having to install all the infrastructure and the tooling. If that can be provided by platform services that are focused on the experience, on minimizing the effort for the streamlined teams to understand how to do these things with good abstractions and good developer experience. Then we help them minimize cognitive load, the extraneous kind. So they have more sort of mental space to understand the business as well as the code itself, so they can better evolve and support it.

4 Fundamental Team Topologies [00:27:33]

Henry Suryawirawan: [00:27:33] So I think this is also a good segue to actually describe what are the four fundamental team topologies you mentioned in the book. So you have mentioned like stream aligned team and platform, maybe a brief description of all the four for the audience here to listen.

Manuel Pais: [00:27:47] So we start with the stream aligned teams. So this would be teams with end-to-end ownership. Obviously there are other terms that are similar, like product teams, cross-functional teams. We call them stream aligned teams because we thought that will provide a more specific definition, because sometimes you have streams of work that are not necessarily a product or a part of a larger product. So that’s the starting point. So ideally, in organization you have mostly stream aligned teams that are providing value to customers, and have end to end ownership. But because of cognitive load, that means this might introduce a lot of demand on this team because we’re saying, well, you need to understand the customer problems. You need to understand then how to build a solution with software. You need to understand how to deploy, how to run the solution.

To minimize that sort of cognitive load, we then have platform that I mentioned earlier, where you have platform teams that are focused on providing, the stream aligned teams are the customers, they’re focused on providing value to our internal teams. But also in this product focused way where we need to understand our own internal customers, and what do they really need. So we don’t go off and build everything that we think they need. But actually, we talked to them. We get quick prototypes. We get fast validation of is this what’s going to really help you. And then we have two more types of teams supporting, and helping reduce cognitive load. One of them is enabling teams. These teams don’t usually build any service. They’re sort of experts in some domain. What they do is help the stream aligned teams in particular, but maybe also platform teams increase their awareness, and their understanding around different domains. Then this might be more technical, understanding about test automation, for example, monitoring, or it can be more kind of product domains, understanding more about user experience, understanding more about regulations perhaps in certain industries. These teams are usually a small team of experts in the enabling team that are going, and helping up-skill the stream aligned teams, so that they have the necessary knowledge to do their work more autonomously. Not that we need everyone to become an expert in these domains, but at least have a sort of working knowledge that we can do the common tasks in the life cycle.

And finally, we have complicated subsystem teams, which are optional. We realize in some cases, you do need these teams, again, because of cognitive load. So if you have a stream aligned team where part of their service includes, for example, face recognition functionality. Nowadays there are many sort of solutions. But I actually worked on some systems like that about 10 years ago, where there weren’t all the third-party offerings that exist today. It was actually, you needed someone with a PhD to understand the algorithm, understand how to make changes, how to test. In those cases, it makes sense to have a complicated subsystem team where you say, there’s a part of a larger service that really requires very specialized knowledge, PhD type of knowledge. Usually not technology specialization, although in some particular cases might be the case, where this team exists because they are helping reduce cognitive load on the stream aligned team. So you could have one complicated subsystem team that’s only exists to help one stream aligned team. Maybe that’s the only customer of this complicated subsystem, but it still makes sense because we reduced their cognitive load. They wouldn’t be able to have end to end ownership of a service where there’s one part that is super complicated. But in general, we find this should only be needed in very particular cases. So most organizations probably shouldn’t have any complicated subsystem teams, and in some cases they might need one or two maximum.

Henry Suryawirawan: [00:31:16] So you mentioned that the last one, the complicated subsystem team actually is an optional thing. But how about the other three? The stream aligned team, the enabling team and the platform team. Does a company need to have all of them? Because now these days, the way I think of it, enabling team, probably a consulting team can come in and help to enable teams. And also platform, we are talking about cloud platforms, and so many platform-as-a-service providers. Does the company need to necessarily have these three teams that you mentioned in the beginning?

Manuel Pais: [00:31:45] So it’s not necessarily that you need the teams. I would say, any organization will be in a better place if they have the thinking around these types of teams. So platform thinking, enabling thinking, so what do I mean with that? Well, to some extreme, because we get this question often, if we’re a small startup, we can’t have all these different teams. So if you start from that sort of extreme. And yes, you won’t be able to have stream teams, plus platform team, plus enabling, but you can have the thinking. So the platform might be something as simple as, okay, we use let’s say, AWS, and we use some other SaaS providers. But the platform, maybe it’s just a Wiki page that helps teams understand how do we use that? Provide some useful kind of default, some useful recommendations on use serverless for these types of tasks, and this is how you can get started, or use other services for other types of work. In a way, you can define a platform as just a Wiki page, helping guide teams, and basically building in the shared knowledge of what works and what doesn’t, even if we’re a startup.

And the same for enabling teams, understanding that, first of all, we know that technology is always evolving, and new practices are coming up. 10 years ago was DevOps and five years ago I think was SRE started. That’s not going to change. That sort of evolution is going to continue. To be aware of how do we help teams evolve in a way that doesn’t depend on their free time, or depend on people having the willingness to learn outside of work. We shouldn’t expect that from people. We should find ways in enabling teams or they need this enabling thinking as a way of allowing teams to grow and to gain new capabilities over time. But you might not have a team, you might have a couple of people, for example, in a startup who have been there longer, maybe are more senior. And so maybe they dedicate part of their time to facilitate knowledge to others. Or maybe even between more senior team, and more junior team, if the startup is growing, where one team is helping facilitate the other. They’re not fully dedicated, but at least you start having this enabling thinking.

And then at some sort of size, it starts to make sense that you have platform teams because if you have many stream aligned teams, you start to want to have ways to embed good practices in the platform, while considering also the specific needs of different teams. So, it starts to make sense to have platform teams, and perhaps enabling teams. That depends also a lot on the domain. Like you said, you might have some consultants helping. So we have one example in the book exactly like that, where an enabling team was set up with some external consultants that brought sort of the knowledge around Continuous Delivery, and modern practices for software delivery. And then you had some internal people who had the application, the system knowledge. So these people together made for good enabling team that help accelerate their delivery, set up some good practices, and then it expired in a way that the team because they had achieved their goals. There are other domains where you might need a more long term enabling team, for example, user experience or other areas where there’s a constant need to evolve the learning, and the skills of different teams. So it really depends. But the thinking of why we need this type of approach is what really matters at any scale.

Team API [00:34:55]

Henry Suryawirawan: [00:34:55] Thanks for the clarification. So, when we have these teams, for example, let’s say we have a stream aligned team and also a platform team, or even multiple stream aligned teams, because in the company, you can have multiple products and business lines. I think in the book you mentioned this concept called team API, how the teams should interact with each other. I mean, a lot of software developers understand about API, like software APIs. But what about team API? What is actually team API?

Manuel Pais: [00:35:20] So it’s a bit of a techie term, right? So APIs, Application Programming Interface is defining the way that you interact with some system or service through this API. Basically, it’s an interface. So what we’re saying is the team APIs interface to the team. The objective of the team API is for a team to clarify to other teams, how do we work? How do we like to communicate and interact with other teams? What are the practices that we follow? Also, what is our sort of roadmap? What are we working on? What’s coming next? It’s very much focused on what other teams need to know about us. What’s going to be helpful for other teams to interact and understand what we do? So it’s a bit different from, for example, team working agreement that tends to be internal to the team, where we say, okay, this is how we work together inside the team. This is how we follow Scrum or we follow TDD, or what practices do we do internally? And so there might be some overlap, but the intent is quite different. So the team API is actually making it easier for others to interface with us as a team. So that we have more clarity and less ambiguity on how other teams should interact with us.

Henry Suryawirawan: [00:36:28] So maybe can you share with us, what are some of the good practices for team APIs? Maybe for us to implement within our daily working life.

Manuel Pais: [00:36:37] Yeah. It’s really thinking about the other team’s perspective. So it’s a little bit of having that empathy as well, to understand if other teams might be frustrated in terms of their interactions with us, or they don’t understand what we do. Okay, how can we make that better. How can we clarify that through the team API, perhaps? What I usually recommend to teams is the team API should not be a static artifact. You should evolve it over time as you realize that some problems or awkward interactions have happened with other teams. So if, for example, we often get Slack messages from other teams asking similar questions, is this something that we could actually make visible in the API? Maybe there’s some documentation, but just people don’t know how to access it. So with the team API is a sort of single entry point to our team. That’s where we should then make it clear, this sort of questions, look at this document, and you’ll find the answer probably, and if not, contact us. So we make that sort of communication easier, and we also reduce the overload on our team, on some things that maybe we didn’t expect to get questions because we have documentation, but actually it’s not visible. It’s not easy to access. You can’t expect other teams to know where all your documentation is or where all of your practices are. It’s really providing this single entry point to help other teams interact with us.

3 Interaction Modes [00:37:57]

Henry Suryawirawan: [00:37:57] And this is also related to the three interaction modes that you mentioned in the book, right? So maybe you can share a little bit, what are the three interaction modes that you mentioned?

Manuel Pais: [00:38:07] Yes, so, besides the four fundamental types of teams we talked about, then we have the three core interaction modes, which help these teams understand what are some useful ways for us to interact? What are the expected behaviors from us as a team when we’re doing this interaction with other teams? In many organizations, there is this sort of a naive expectation that, well, we just collaborate whenever we need. But that’s very loosely defined, right? What does that actually mean? What is the purpose of this collaboration? And so many teams actually, when they’re talking about collaboration with other teams, it’s actually more the relationship they have with other teams. The dependencies they have, they’re not actually having specific concrete ways of interacting.

Specifically, what we described in the book that we found to be the three key interactions are first collaboration, but in a well-defined way. So we’re talking about two teams working together for a period of time to achieve a specific outcome. So the more specific this outcome is, the better we’ll be able to identify if we’ve achieved it or not. So maybe we’re working together to understand how we automate some deployment of some services. If the outcome is well-defined, once we have one automated deployment in the pipeline, then we can say the collaboration is done. Yes or no? Is the collaboration finished? And we also set the expectation on how long we expect this to last. So it’s not an open-ended collaboration, which like I said, can lead to actually more of a relationship and the dependency between teams, where every time we need to deploy, we need to ask this other team to help us, for example. That’s not the collaboration we’re talking about. That’s the dependency. And it’s actually a blocking dependency. We cannot do anything unless this other team has the time to help us. We want to move away from that with kind of specific ways that teams should interact to help them achieve certain outcomes. So collaboration is one of them.

Then we have facilitating as another core interaction mode, especially for enabling teams, they are facilitating knowledge to others. So you’re typically not actually building anything or working on some service where you’re maybe pairing, or running some workshops, or helping teams understand. Improve their knowledge around some aspects of either the business or the technical side or practices that we use in the organization. So that’s the facilitating. Again, should it be framed in terms of what’s the expected duration? What do we want to achieve? What should you know after we’ve facilitated for this period of time? And finally, we have X-as-a-service, so that’s very much based on things like infrastructure as a service or software as a service, where especially for the platform, we’re saying, at some point we want to have services in the platform that are mature enough and stable, and provide a good developer experience with the right documentation, right level of reliability. So that teams can consume without actual interaction. So it’s the lack of interaction because we have this service in a way that is easy enough to understand to consume independently. So have one team providing a service and then one or more teams consuming the service. Very much like you consume AWS services, right? You don’t expect to have to talk to AWS engineers to run their services. They’ve done the work. Hopefully, there are degrees around this where sometimes the service needs some improvements, and the documentation needs improvement. But in general, they’re at a mature enough state where other organizations consume them independently.

We did three core interaction modes. You can then help teams have more focused interactions, understand when and how they should interact with others. Instead of that kind of blurry, everyone collaborates with everyone. Just to finalize, we shouldn’t expect these interactions along these three modes to always go perfectly and smoothly. There will be issues and situations where we thought this was going to take two weeks. It took two months. Those are great opportunities to actually reflect on why did it take so long? Was there something that wasn’t clear? Maybe we wanted to collaborate, but one of the teams actually needs to be facilitated first to improve their understanding of some domain, let’s say, about infrastructure as code. They need to best understand it before they can collaborate with another team that has more experience. So that it actually makes sense to collaborate. Otherwise, the collaboration becomes a facilitating mode. We’re not saying that it’s all going to go nice and smoothly, but it provides a better framing to actually learn, and then understand when some interaction goes wrong or awkward. How can we learn from that and course correct.

Advice to Align with Team Topologies [00:42:41]

Henry Suryawirawan: [00:42:41] So having all this knowledge that we just discussed, I know that various teams are in different stages. Some are more mature, some are more messy than the others, so to speak. So maybe one key takeaway from you among all these situations, of course, it’s quite tough to define. What will be the key takeaway or key message that you want to give to the listeners here? What can they do in order to improve their team, the structure, or maybe the way of collaboration within their team in order to become much better aligned to the team topologies?

Manuel Pais: [00:43:10] Just one?

Henry Suryawirawan: [00:43:12] It could be many. Up to you.

Manuel Pais: [00:43:14] Looking back at our conversation, I would say, for engineering managers in general, obviously, you have different responsibilities in different organizations. But in general, start by acknowledging these constraints that we talk about, like Conway’s Law, cognitive load. Also, trust boundaries that we didn’t talk about. But the fact that we are conditioned as humans in certain ways, and specifically in software delivery, also things like Conway’s Law. So acknowledge that this constraint exists. Often we’re focused on what are the good practices or best practices, and what’s the best way to do things? Practices and principles are obviously necessary and useful, but they should be informed by what are the constraints in the first place? Constraints are usually things we cannot really change. So, limits on cognitive load, limits on trust between groups of teams, Conway’s Law, things we cannot really change. So we need to acknowledge them, and then build and decide on practices and principles based on that. Because if we don’t, then we might be fighting these constraints rather than understanding and leveraging them in our advantage. So I would say that’s the first thing.

The other thing with team topologies, hopefully, we’re not just clarifying the ways that teams can interact, and their mission of different types of teams. But also we’re hopefully helping teams become more motivated with feeling more autonomous, that they have more ownership of their service or the things they provide, and that they’re becoming more competent. So these are the key drivers of intrinsic motivation that Daniel Pink wrote about in his book Drive. So we think team topologies also help teams become more autonomous, have more ownership. And so the role of the engineering manager ideally starts to move away from managing the team per se, and then making decisions for the team to actually making less decisions, let the teams have more kind of local decision-making. And so if you’re sort of getting out of the way of the teams, providing them what they need to become more autonomous in terms of skills, competencies and support, then you can look more into how do we help? Even if we’re a manager of one team, how do we help this team understand how to deal with other teams in a more productive way? How we help this team, our team remove blocking dependencies on other teams.

That’s not easy for a team that there’s already quite busy with their day-to-day work, and their goals as a team. It’s, I think, where managers can actually help a lot is start helping address the blocking dependencies problem. This is unfamiliar for many teams to deal with is how do we deal with this? We can apply the interaction patterns. Let’s have some collaboration so that we don’t depend on you any more to do the deployments, for example. How do we collaborate so we understand how to automate around deployments, for example? The managers can have a very strong I think input on that. Helping teams navigate dependencies on others. Can we actually remove or minimize blocking dependencies in particular?

And finally one last thing, sorry if it’s too much, just one last thing is start also thinking about alignment of purpose between the team and individuals. With the fundamental types of teams, we now have ability to be more clear on what is the mission of this team, of a stream aligned team, is different from enabling team or platform team. We have more clarity, why does our team exist? What are we trying to achieve? Who are our customers? But then, there’s also the individual purpose, right? Every person has their own individual goals and individual motivation. And often, we don’t consider the two together. We think, well, the team purpose, and everyone aligns to the team purpose. Again, it’s not really how things work. People have their own goals, and what they want to achieve. We can surface those conversations, understand someone who is really focused on the technical side, understanding the technology, and keeping up to date on that. Maybe it’s not a great fit for a stream aligned team where you want more T-shaped or more generalist people that are actually more focused on end-to-end delivery. So maybe that person is a better fit for a platform or even perhaps enabling team.

And so helping understand what are the individual purpose, and how they align to which types of teams. Often, it’s a mixed where you need people to align their individual purpose with the team, but also be willing to learn and improve on some areas that they don’t have as much experience yet. So even for a platform team, for example, you might have people who are more focused on understanding the technology and the technical side, but they still need to understand how do we deal with internal customers? How do we get much better insights on what they need? How our platform is helping or not getting metrics, and getting the right interactions with those teams? In short, understand your constraints. First, get out of the way of teams as much as possible. Let them make more decisions on their own, increase their autonomy and ownership. Help teams deal with dependencies between them, especially blocking dependencies. Help them navigate those, and hopefully minimize those so we can have faster flow. And finally, help align individual purpose and team purpose.

3 Tech Lead Wisdom [00:48:13]

Henry Suryawirawan: [00:48:13] Thanks for all the tips. So I know that we arrived to the end of our conversation. But I have one last question that normally I asked for all my guests, which is called the three technical leadership wisdom. So maybe Manuel, can you share with the audience here, what are your three technical leadership wisdom?

Manuel Pais: [00:48:28] That’s basically what I just said. I think from a team topologies perspective, understanding these constraints that we have, then understanding that we’ll be able to achieve more productivity, and more faster flow if we have teams with more autonomy. And finally, if we help reduce dependencies between teams, especially blocking dependencies, which again, involves a lot of the things we’ve talked about. Understanding your value streams. Understanding where’s the wait time? Where are the dependencies, the handovers of work between teams that we might need to deal with? And then, how do we increase the skills of our stream aligned teams with enabling aspects, and also reduce their cognitive load with the platform? So I would say, those three things understand constraints like Conway’s Law, cognitive load, trust boundaries. Looking for ways to improve the autonomy, and the ownership of this stream aligned teams, and other types of teams. Help teams navigate sort of the ecosystem, and navigate and reduce blocking dependencies between them.

Henry Suryawirawan: [00:49:29] Thanks so much, Manuel. So for people who want to follow you online, is there a place for them to follow you?

Manuel Pais: [00:49:34] Yes. So there are two main places. First, we have our website, teamtopologies.com. So in there, you can find new industry examples and case studies that came out after the book was published. The book was published late 2019. So since then, we’ve learned also new patterns and we have new examples. So all of that is available at teamtopologies.com. And then we have started an academy called Team Topologies Academy. So that’s at academy.teamtopologies.com. What we’re trying to do is have video-based on demand courses that help people learn about the key ideas of team topologies, sort of more condensed way. So we have a first course called Team Topologies Distilled. And then we’ll be adding more courses where we bring our latest insights around team topologies. Things that work well, things that maybe in certain situations don’t work as well. And also new patterns that we find and new knowledge. So that’s all going to come in the next months and years for the academy.

Henry Suryawirawan: [00:50:33] So thanks Manuel for your time. It’s really great to have a talk with you today, and thanks for sharing your knowledge.

Manuel Pais: [00:50:37] Thank you very much.

– End –