#233 - Data Beats Hype: Measuring Your AI Adoption Impact - Laura Tacho

29-Sep-2025 1 hour 7 mins Laura Tacho

included in Culture & Practices Leadership Software Craftsmanship AI/ML & Data Engineering AI/ML

“Engineering leaders are stuck between the expectations put out by sensational headlines and the reality of what they’re seeing in their organization. There’s a big disappointment gap.”

Is your AI investment paying off? Many leaders struggle to see real ROI beyond the hype.

In this episode, Laura Tacho, CTO of DX, shares DX’s new research on measuring AI adoption success across 38,000+ engineers. Our conversation reveals why acceptance rates are misleading metrics and introduces DX’s new AI Measurement Framework™ with its three critical dimensions: utilization, impact, and cost. Learn why treating AI as an organizational problem closes the “disappointment gap” between hype and reality.

Note: This episode was recorded in July 2025. The AI adoption rate mentioned has since risen to nearly 80%.

In this episode, you will learn about:

The “Disappointment Gap” between AI hype and reality
Why the popular “acceptance rate” metric is misleading
The DX AI Measurement Framework™ and its three dimensions
The top time-saving AI use case (it’s not code generation!)
How AI impacts long-term software quality and maintainability
Why organizational readiness matters for successful AI adoption
The bigger bottlenecks beyond coding that AI has not yet solved
Treating AI agents as team extensions, not digital employees

Timestamps:

(00:02:32) Latest DX Research on AI Adoption
(00:03:54) AI Role on Developer Experience
(00:05:43) The Current AI Adoption Rate in the Industry
(00:09:27) The Leader’s Challenges Against Al Hype
(00:13:22) Measuring AI Adoption ROI Using Acceptance Rate
(00:17:39) The DX AI Measurement Framework™
(00:23:05) AI Measurement Framework: Utility Dimension
(00:27:51) DX AI Code Metrics
(00:30:31) AI Measurement Framework: Impact Dimension
(00:32:57) The Importance of Measuring Productivity Holistically
(00:35:54) AI Measurement Framework: Cost Dimension
(00:38:34) AI Second Order Impact on Software Quality and Maintainability
(00:42:38) The Danger of Vibe Coding
(00:46:31) Treating AI as Extensions of Teams
(00:52:31) The Bigger Bottlenecks to Solve Outside of AI Adoption
(00:55:47) DX Guide to AI-Assisted Engineering
(01:00:38) Being Deliberate for a Successful AI Rollout
(01:02:32) 3 Tech Lead Wisdom

_____

Laura Tacho’s Bio
Laura Tacho is CTO at DX, a developer intelligence platform, co-author of the Core 4 developer productivity metrics framework, and an executive coach. She’s an experienced technology leader and engineering leadership coach with a strong background in developer tools and distributed systems.

Her career includes leadership roles at organizations such as CloudBees, Aula Education, and Nova Credit, where she specialized in building high-performing engineering teams and delivering impactful products. Laura has worked with thousands of engineering leaders as they work to improve their engineering practices with data.

Follow Laura:

LinkedIn – linkedin.com/in/lauratacho
Twitter – x.com/rhein_wein
Website – lauratacho.com
📝 AI Measurement Framework – getdx.com/whitepaper/ai-measurement-framework/?utm_source=techleadjournal
📝 Guide to AI-Assisted Engineering – getdx.com/guide/ai-assisted-engineering/?utm_source=techleadjournal
AI code metrics – getdx.com/ai-code-metrics

Mentions & Links:

DX – https://getdx.com/
DX Core 4 – https://getdx.com/news/introducing-the-dx-core-4/
DORA – https://dora.dev/guides/dora-metrics-four-keys/
SPACE framework – https://getdx.com/blog/space-metrics/
Vibe coding – https://en.wikipedia.org/wiki/Vibe_coding
Agentic model – https://en.wikipedia.org/wiki/Agentic_AI
Docker captain – https://www.docker.com/community/captains/
MCP – https://en.wikipedia.org/wiki/Model_Context_Protocol
A2A protocol – https://developers.googleblog.com/en/a2a-a-new-era-of-agent-interoperability/
Prompt engineering – https://en.wikipedia.org/wiki/Prompt_engineering
Claude Code – https://claude.com/product/claude-code
Cursor – https://cursor.com/
Figma – https://www.figma.com/
GitLab Duo – https://docs.gitlab.com/user/gitlab_duo/
Atlassian Rovo – https://support.atlassian.com/rovo/docs/what-is-rovo/
ChatGPT – https://en.wikipedia.org/wiki/ChatGPT
Assembly – https://en.wikipedia.org/wiki/Assembly_language
Docker – https://www.docker.com/
Docker Compose – https://docs.docker.com/compose/
Dependabot – https://github.com/dependabot
Granola – https://www.granola.ai/
Jenkins – https://www.jenkins.io/
Pragmatic Engineer – https://www.pragmaticengineer.com/
Nicole Forsgren – https://en.wikipedia.org/wiki/Nicole_Forsgren
Dr. Margaret-Anne Storey – http://margaretstorey.com/

Our Sponsor - Tech Lead Journal Shop

Are you looking for a new cool swag?

Tech Lead Journal now offers you some swags that you can purchase online. These swags are printed on-demand based on your preference, and will be delivered safely to you all over the world where shipping is available.

Check out all the cool swags available by visiting techleadjournal.dev/shop. And don't forget to brag yourself once you receive any of those swags.

Like this episode?

Follow @techleadjournal on LinkedIn, Twitter, Instagram.

Buy me a coffee or become a patron.

Buy me a coffee

Quotes

Latest DX Research on AI Adoption

At DX, we want to help improve engineering efficiency across engineering organizations. We’ve been focused on researching what those drivers are that are helping engineers get better work done, enjoy their jobs more, bring organizational efficiency to the business.
And of course, AI has been a game changer in the last two and a half, three years. So, over the last year, especially, we’ve been trying to understand how companies are gaining efficiency from using AI tools. How are they not gaining efficiency from using AI tools? How should you measure the impact of them anyway?
Because there’s so many different ways to measure developer productivity. It was almost like, we had a hard time figuring out how to measure productivity and we’re just sort of figuring out how to answer that question, and then AI came into the scene and changed the game. It’s been a really accelerated education journey for a lot of us in the industry, but a really exciting time.

AI Role on Developer Experience

One of the core questions that we’re trying to answer is: Is AI, in and of itself, a driver of developer experience? Or does AI just lift up every other driver of developer experience?
And depending on your own attitude toward the tooling, what you’re reading in the media, you’re gonna see different opinions or shapes of opinions around this.
Because there’s certainly some camps of people that will say, AI is a magic bullet, it’s a paradigm shift, it’s changing everything, everything is new. And then on the other side, we have, nothing is new, it’s the same old world, AI is a dev tool just like any other dev tool, and it’s going to work by improving developer experience. We apply it across the SDLC and we see lift the same way that, we might apply other forms of automation or other dev tools.
And so we’re kind of caught in between. The world is completely brand new and the world is exactly the same. And there’s still a lot of unanswered questions, even though we’ve been able to find some very interesting patterns in the research so far.

The Current AI Adoption Rate in the Industry

Surprisingly, even though AI has so much natural curiosity and enthusiasm from individual developers as well as from executives and company-wide and from the industry, what we’ve seen is that companies that just sort of open the flood gates and give everyone a license, for example, Copilot is turned on, they’re not getting a hundred percent adoption right away.
Because just like any other tool, we need training and support in order for developers to understand what are the meaningful ways I can use this tool. Am I allowed to use this tool? I have a license to it, but am I violating other licenses or copyright? Am I gonna be penalized in performance review because I rely too much on assisted tooling and I’m not authoring enough by hand?
There are a lot of questions from the developers and a lot of questions that can be answered from the executive team. So surprisingly, organizations are seeing that adoption is a little bit slower than they would have anticipated with so much industry buzz around it.
We just did a study of 38,880 engineers, and what we found was that the median adoption is like 50% [in July] of developers at companies. This is research from, of course, DX’s customers, but also wider, across industry research.
And when you think about what you read in the media with all of these companies touting their outrageously impressive efficiency gains with AI, the reality is that the median adoption rate is 50% [in July] and the top quartile, so P75, top 25% is only about 62% of developers using it on a weekly or daily basis.
I have seen extremely few companies. In fact, I don’t think I’ve ever seen a company that has a hundred percent adoption, except maybe DX, we’re very heavy users.
Usually the larger the company gets, the more challenging it is to get widespread adoption. There’s just more types of engineers, more types of systems and components, more procurement, legal risk, compliance hurdles to go through. So as the complexity gets bigger, adoption tends to slow down.
I have seen an interesting pattern that very big financial companies or highly related industries are actually seeing a little bit more adoption because they have to be so structured about things. And this is a scenario where, slow is smooth and smooth is fast.
Companies that have taken a really deliberate approach to rolling out AI have thought through training, enablement, risk, compliance, asking all the questions, answering all the questions, really supporting their engineers and treating this as an organization wide problem and not just a tool for an individual developer to use. They’re seeing the best adoption that’s actually sticking around.
This adoption question, it is a question of organizational readiness, not of individual readiness necessarily. And that’s what a lot of organizations perhaps got wrong at the beginning, or what we’ve all learned together as an industry is that if we want organizational results, we need to treat it like an organizational problem.

The Leader’s Challenges Against Al Hype

A lot of engineering leaders right now are stuck in between the expectations that are put out by ad supported media that relies on sensational headlines. And then the reality of what they’re seeing in their organization.
And there’s a big disappointment gap, because executives are disappointed. They look and say, hey, we’re investing all of this money in AI, it’s not getting cheaper. Where’s the ROI? Why aren’t we writing 30% of our code with AI tools?
And then developers who maybe are interested in it and maybe have been skeptical because we see a tool and it’s been promised the world and we try it maybe once or twice. And even though there’s all these really cool workflows and things are changing like hourly, it seems the reality is that any model is still not capable of doing a hundred percent of the work a hundred percent correctly.
You could check SWE-bench, which is benchmarking the models against a set of tasks and seeing how much of them can actually complete it accurately. Maybe we see 70% on a really exceptional case, but we’re not even close to widespread reliability or feasibility even in an enterprise or business setting.
It’s one thing to sort of vibe code my grocery app that’s gonna remember what I have in my fridge so that I can make a meal. It’s a really different thing to be dealing with an enterprise feature with PII, and there’s all the horror stories that you’ll read all over the place.
That’s been definitely a personal challenge for engineering leaders, is being stuck in that disappointment gap where there’s just pressure from all sides. And unfortunately, whether we like it or not, it is the role and responsibility of engineering leaders, of engineering managers, VPs, and CTOs, we have to also educate our ’non-technical stakeholders'.
Our responsibility is on us to help decompose those headlines, help people understand what’s realistic, what the real capabilities are of these tools. And that can feel really unfair sometimes. And it just takes time away from all the million other things that we have to do.
But it’s such an important part of the job because when we’re stuck in this hype cycle, no one wins. And truth be told, the way to beat the hype is with data.

Measuring AI Adoption ROI Using Acceptance Rate

AI assisted engineering tools, has been going from infancy to like toddlerhood and maturing extremely rapidly. This is on an extremely accelerated timeline where it’s almost hourly things are changing.
And one of the shortfalls of these tools, they’re not even three years old yet. Some of ’em aren’t even two years old. Some of them aren’t even two months old. But one of the deficits of those tools early on was that they didn’t have great telemetry.
Or we also as an industry didn’t understand what even matters. What is it about these tools that’s moving the needle for the business? And so there was just a black hole of visibility into their efficacy, their utility, their impact.
So acceptance rate was one that was latched onto early, because it is fairly straightforward to measure. It’s available from a lot of tools, they can give you information about what they’re suggesting and then see how much acceptance it’s accepted.
This is actually a good signal to understand the maturity of the tool and its fitness for particular business use cases. So we can think of a scenario where no suggestions have been accepted, then we know the model is just giving us garbage.
Or the developer saying, I’m not gonna accept this, what if a scenario is that it’s 50% acceptance rate or 75% acceptance rate? Now we have evidence that this tool is actually producing something that’s worthwhile and it’s helpful to the developer, and that’s gonna bring gains, reduce cognitive load, maybe experimentation, innovation. These things are accelerated because of that.
But where acceptance rate kind of ends is right there. It’s a good signal to tell us is the tool providing useful suggestions, but it doesn’t tell us at all about whether that code made it to production, if it was tweaked or edited by the human before getting into a customer facing environment.
It doesn’t tell us if it gets to a customer facing environment. And it surely doesn’t tell us anything about maintainability, quality, business impact, any of those things that are so essential.
And not because in and of itself, acceptance rate is a bad metric. It’s just that, as an industry, we didn’t have the mechanisms to measure. We didn’t even know what was important.
So now I think about acceptance rate as a measurement from a different time. It was really important when these tools were in their infancy and we were trying to figure out are they giving us useful suggestions.
But it’s so easy to measure that a lot of companies just haven’t changed away from measuring that, and now they’re getting this signal, which is, in a limited, appropriate context, useful signal, when applied as a general measurement of whether AI is having impact, very poor signal, and actually misleading to the point where it can be leading to some bad decisions with big money. And that’s definitely what we don’t want.

The DX AI Measurement Framework™

We’re very research focused at DX. We have a research team. We’re working with folks from DORA. Nicole Forsgren’s on our research team. The SPACE framework co-author, Dr. Margaret-Anne Storey’s on our research team.
So that’s sort of in our company DNA, and whenever we make a recommendation to our customers or out to the industry in general, ‘cause we believe that this research should be open and free for other people who are not DX customers to be able to use, we wanna make sure that we have evidence to show that it is useful.
And there was a huge deficit of this in the AI world. As we just talked about, we had acceptance rate. Then we’re starting to get a little bit more telemetry and visibility as these teams mature, but there still wasn’t the answer to the question, what actually is important when making a business decision?
So over the last year, we had been partnering with companies in the process of their own AI rollouts. And through that we’re really understanding what are the levers. What are the things to actually pay attention to to make sure that you’re getting the ROI that you expect and setting yourself up for success with these AI tools.
This framework is called the AI Measurement Framework. It covers AI measurements across three important dimensions. First is utilization, then we have impact, and cost.
Just like any other framework that has multiple dimensions, like the DX Core 4, SPACE framework, for example, it’s so important to look at all of these together as a collective unit of measurement and not get hyper fixated on one or the other. That’s where those bad patterns, like looking only acceptance rate come from.
The reason that we have it structured this way, is that it does mirror a lot of the journey of AI in a lot of companies. So first, we do actually wanna focus on adoption and utilization.
We have found very significant in some cases gains from AI. We know these tools are things that developers enjoy using and that do bring time savings and other efficiency gains.
What we found in our research is that getting users from not using it at all to using it consistently, even if it’s periodically once a month, there’s a lot of time savings to gain there.
And so adoption is a big piece of the story because we want people to be on the ramp. We want them to be using the tool so that they can continue to grow and continue to see efficiency.
Also as an organization, if you’re investing in licenses or a lot in these tools, you actually want your users to be using them just like any other tool. If you were to buy a CI/CD tool, you would be tracking how many projects are being automated with this particular tool.
And so this is sort of that similar pattern. So we wanna get people adopting the tool because that’s when we can actually have a sufficiently big sample size to see what are the use cases that are really useful in our organization.
Once we have people using the tool, then it’s a question of impact. We want them to be using it, but also directing that usage in a way that’s gonna benefit the organization, the team, the individual.
For impact, the primary metric that we recommend measuring is time saved per developer. It’s not by any means a perfect metric and certainly not the only one that you should be looking at. But in terms of ease of collection and reliability, it is the one that we do recommend, of course, when paired with the other metrics in the framework.
So, we wanna see that developers are using AI and actually saving time because that time can then be reinvested into innovation. It can be reinvested into improving DevEx. That’s where the efficiency gains are coming from, and that’s been pretty widely standardized across the industry.
Google had a paper recently about their 10% productivity gain. They’re using time savings per developer per week, as well as their metric. So we feel pretty good about that recommendation.
We also wanna look at things like developer satisfaction, and then this is where looking at those really solid metrics of development team or development organization performance. Like the DX Core 4, because AI does not operate in a vacuum.
And so we have to be looking at things like speed, quality, developer experience, business impact, and see how AI impacts those. And so that’s the impact column.
And then after that we get into cost. And this is just making sure that you’re spending the right amount, not too little, not too much. What are some of the more specific ROI calculations to make sure that the financial story also lines up?
Because at the end of the day, this isn’t just using AI for the sake of AI. This is a business. It has to be a business decision. And if you’re gonna go and ask for your company to open their wallet, we need to have a really solid financial story about why this is beneficial to the business. And that’s what that cost dimension will do for you.

AI Measurement Framework: Utility Dimension

We wrote this framework for AI assisted engineering workflows. And that can cover a couple different modalities. That can cover the chat modality. It can also cover the auto complete in the IDE as well as agentic workflows.
So depending on your context, which tools you’re evaluating, which tools that you wanna have visibility into, it’s reasonable to include architecture planning with ChatGPT as part of this adoption. If you’re paying for a ChatGPT Pro licenses across your engineering team, you definitely wanna capture that as well.
This is designed of course for software engineering, but I think we could genericize it. And we could use it effectively in other business functions as well. It’s meant to be based on principles that are transferable across domains.
We have in this framework percentage of PRs that are AI assisted and also percentage of committed code that has been authored by AI. And, percentage of PRs that are AI assisted is something we can use this technique called experience sampling to gather survey based, self-reported pretty easily.
A PR is open. You can ask the author, hey, did you use AI to author this PR? How much time did you save? And that’s where we’re getting a lot of the data, generally across the industry, is from the self-reported kind of close in proximity of time to the task. That’s where we get the most accurate.
When it comes to percentage of code that is committed, that’s been authored by AI, that has been a problem that has been very difficult to solve. And it’s actually quite important, again, not a useful metric on its own.
But when we combine it with adoption, when we combine it with code quality, it helps understand the surface area and the actual footprint of AI’s contribution and how people are using it.
And this was a problem that DX, we wanted to solve. So we’re really happy to announce, we have a new AI code metrics tool that can help organizations understand how much of their code in commit.
It will help organizations understand how much of their committed code has been authored by AI down to the commit level. And this works across all IDEs and all different modalities.
So it’s on the file system level looking at changes at that level. So this isn’t an estimation. It’s not looking at acceptance rate, it’s looking truly at committed code that is written by AI.
And again, in and of itself, that metric is not the most useful thing. But in combination with other metrics and helping organizations understand the total footprint of AI generated code in their organization, the downstream effects, it’s really valuable.
And so we’re really excited about this new tool. We’ve gotten some outstanding feedback already from our customers that have been trialing it.

DX AI Code Metrics

It’s not an IDE plugin, ‘cause we found that they can kind of slow things down and that’s like the last thing we want. It works on the file system level. It’s like a daemon that runs in the background and then it’s syncing with an upstream server to get the data into a format then that can be queried and understood.
It works across all IDEs and all different kinds of tools. So if you’re using an IDE that has an IDE plugin but also using something else in your terminal, it can aggregate all of those things down to the commit level, which has just been a huge question mark for our industry so far.
When you hear in the news, 30% of Microsoft’s code is, written by software. Or Anthropic’s CTO saying in six to nine months can be 90% of code written by AI. Usually what that’s talking about is acceptance rate. It’s not actually talking about code that reaches a customer facing environment.
And there’s a huge difference there. It’s just been a big question mark for our industry. So finally having some clarity there, is it that engineers just enjoy coding with the tools but actually they modify everything and none of that code actually makes it to production?
That’s a really serious question and a question that we need the answer to, ‘cause that really changes our approach, our AI strategy.
There’s a really big difference between rolling out a tool to your whole entire organization and then seeing that only 5% of code reaching production is AI authored, that sends a really different signal than finding out maybe it is 30% of code.
Those are really different environments to live in and that are gonna require different amounts of investment and training and, making sure your pipelines are ready. Are your SRE procedures ready? Those are really important questions that organizations need the answer to. And it’s just been really hard so far.

AI Measurement Framework: Impact Dimension

In our study that we did of 38,000 plus engineers, we found the median time savings right now to be about three hours and 45 minutes for people who are using daily and weekly. And that doesn’t match up to these sensationalized claims, it’s more in line with that 10% time savings.
Because in some particular use cases, all we’re doing is shifting where we’re spending our time. We’re not actually reclaiming time, we’re still prompting, we’re still fiddling.
A lot of this time, we’re also trying to learn the tool while also completing a task, which slows us down as well. We might be moving from our old trustee IDE to Cursor, and then we have a whole learning curve there. So, there’s lots of things that can make it not a total net positive or time savings gain.
Part of why this metric is there is that when used as a signal in combination with the other things that can give us insight into what are these tools actually doing? Because if we find that while quality is going up, maintainability is going up, speed is going up, developer experience is going up, but time savings is staying the same, some companies are gonna be okay with that and some companies are gonna say, okay, we gotta find a different tool.
And so it’s all about fitting the organizational context that you’re operating in. In a lot of cases we’re just shifting where the human spends their time downstream a little bit instead of it eliminating cycles from getting that task done.

The Importance of Measuring Productivity Holistically

Companies that hadn’t unified on a definition of performance are really feeling the pain when it comes to quantifying or measuring the impact of AI because they’re starting with nothing. They’re starting with a big question mark of what does performance actually mean.
And then when they add AI, it’s really difficult to tell. Is this good? Is this bad? Are we moving up or are we moving down? Companies that have really invested in developer experience and had a really clear framework, story around how we define engineering excellence, engineering efficiency.
Core 4 is a great example of how to do that. We’re putting together DORA, SPACE, DevEx, into a simple framework that’s easy to deploy at companies.
Using Core 4, you have your baseline metrics. And then you can see, okay, we’ve added in AI now. Are we actually getting faster? I can look at my speed dimension. Are we improving developer experience? Are we improving quality? Is quality declining? Those are important questions.
And, the business impact dimension, which is are we spending more time on innovation versus maintenance? That’s a big one that a lot of companies are expecting to move with AI.
They’re expecting toil some work to be going away, maintenance and operational work to be going away so that people can spend more time on innovation. But is that really happening?
And without those measurements to start with a baseline, it’s really difficult to know. But even if you don’t have a baseline, it’s a huge misstep to not be looking at those things because we need to look at the second order consequences of tools and not just the direct surface area that the tool is having on your organization like, adoption or time savings. We need to look at the health of the organization and the efficiency overall.

AI Measurement Framework: Cost Dimension

I don’t think that’s true yet. Even the inclusion of cost as a top level dimension, whenever you mention something like cost, our automatic reaction is to reduce cost. That is how people are hardwired to think. And this is not about reducing cost, but this is about making sure that you’re spending the right amount.
We don’t also wanna spend too little. And especially when it comes to training and enablement, we don’t wanna be spending too little. Because there are times that spending too little will actually bottleneck the amount of positive gain that you can have.
For example, if we know that the tool is really good, but you only provide licenses for half of your engineering staff because you don’t wanna invest in it, well, we can do the math problem and think about is that a good decision or not.
But also, let’s say we invest in a hundred percent licenses for a hundred percent of our engineers, but we don’t invest in training and enablement. That’s probably not gonna be as impactful as investing in 65% of your engineers get licenses and doing really concerted training and enablement and then scaling it up over time.
That’s probably gonna lead to better results. And so it’s about allocation, making sure that the investment is the right amount.

AI Second Order Impact on Software Quality and Maintainability

This is a set of tools that’s very new, and so one of the downsides of that. It’s very exciting and things are progressing so rapidly, but we just don’t have the longitudinal data. We don’t have data from now and five years ago to track how has this changed.
And one of the big questions right now with AI is how are these codebases that are AI authored or AI assisted gonna hold up over time? Because it’s trivial to produce a ton of code right now, it’s not all good code and it’s not all code that’s gonna help your customers. And it’s not all maintainable code.
Some of these parts of the problem are a bit of a paradox. I’ve had this discussion a lot about is AI actually like a paradigm shift in programming in general, like a lateral move from expressing what we want to happen in terms of business logic in Ruby or JavaScript to just expressing it in English.
And having that then, broken down into computer instructions, I don’t write Assembly, if I don’t have to. I don’t because we now have better tools that I don’t need to care about that. I don’t have to care about memory allocation and garbage collection anymore. ‘Cause we have nice developer tools that take care of that.
And is AI just another stack on top of this that is allowing us to express what we wanna build in a different way? In the same way that Ruby and JavaScript allow us to do that in abstract away the guts of systems from us. And maybe that’s the case. I think it’s a little bit too early to tell, but that’s an interesting thought experiment to think about what would it be like that?
And so if that is the case, we can think about a future where AI generated code isn’t meant to be read by humans. It’s meant to be read by other AI agents.
And so what does that mean in terms of maintainability? What does that mean for code review? There’s a lot of questions here.
So, the paradox is that AI might actually be producing code that’s harder to maintain or understand. But because we use AI tools, it might be easier for us to understand code that’s hard to understand. And so is it a net wash? Is it a net positive? It’s really difficult to know.
These are some of the areas that we’re really exploring right now and trying to piece together, what does excellence continue to look like in a world of AI assisted engineering. And, unfortunately there’s just not a lot to go off of because it’s so new. We don’t have five years of historical data from a single codebase to see how things are changing over time and how that impacts reliability, maintainability, cost of change, those kinds of things.

The Danger of Vibe Coding

One of the things that I think about often is that teams that have invested in building resilient systems that are able to sort of swallow a little bit of variability are the ones that are seeing the best gains from AI so far.
So, teams that have really great observability or have planned for things to go wrong are the ones that are really in a situation to put themselves in the position to gain a lot because it is non-deterministic. It’s unreliable.
Both were like technically correct, but this is why like vibe coding, if you can imagine just that drift and then that drift happening across multiple people, across days and weeks and months. It is just recipe for disaster without constraints and boundaries and linting and code styles and all of those other things.
And this is why vibe coding is good for a proof of concept. It’s nice to hold something in my hand and be like, here it works.
But when it comes to actually applying a lot of these workflows at an enterprise scale, especially in a highly regulated industry, there’s a lot more that needs to be considered than just vibe coding your way to profit. We’re a ways away, even if AI can bring some really remarkable gains to organizations, teams, and individuals.
We’re still far away from swarms of agents completing full feature sets for us and two things can be true at the same time. We can be really optimistic about what’s about to happen, but then also rooted in reality about what’s happening now.

Treating AI as Extensions of Teams

Rapid prototyping, smaller bug fixes that have very clear boundaries, these are things that are no longer necessarily only the domain now of people who have a traditional software engineering background or maybe have engineer in their title. It’s reasonable for a designer now to take Figma files and turn them into a working prototype and maybe even commit them somewhere.
When we talk about expanding the definition of developer, we need to make sure that our definition and surface area of where we’re looking for gains is very inclusive to include all of the people that are using it and not just people that have the word developer engineer in their title because we might be missing out on a lot of stuff.
AI kind of democratizes code a little bit more. There are a lot of people who work directly on the cusp, support folks who are maybe understanding the code, verifying stuff, and maybe now they can submit the PR, instead of just verifying and providing that detailed report to an engineer. It just changes a lot of workflows. So we have to make sure to look at that across the whole organization.
In terms of measuring agents and AI, we can say agentic workflows is part of the team. There’s been a trend or maybe a scare tactic, like agents as like digital employees.
It is true that some organizations might make the decision to slow headcount growth because they’re getting efficiency gains with AI. But that doesn’t mean that those agents are digital employees.
I think that’s maybe a scare tactic, ad-based or ad-driven revenue, to make people more afraid of it or, to get people to read the article. When we’re measuring the impact that agents are having, we have to understand at the end of the day, it’s still a human that’s responsible for dispatching the work, creating the spec, whatever it is.
And so we should be measuring these workflows as extensions of the teams that they’re operating within. Not as like their own team or some other kind of model where we’re thinking about them as truly autonomous.
There are very much still humans in the loop. Even if the work can be done without a human in the loop, there still has to be accountability on a human level and we can’t forget that.
It’s also complex because, even though I’ll say make sure to measure the gains in the context of the team, it is true that the skills that developers need to be effective are gonna be changing the more we can rely on agents to complete these well-defined tasks or even more complex tasks in the future where a senior developer needs to have skills connecting their work to the business and really understand that because now they’re gonna be responsible maybe for a bigger surface area or dispatching work, getting rid of some of the toil.
And so we’re kind of caught in that dichotomy or the paradox of these aren’t digital employees, but in some ways senior engineers, seasoned engineers are gonna kind of be like a team lead for maybe a swarm of agents. I see that pattern already emerging. And again, two things can be true even though they seem a little bit contradictory. It’s all about understanding the limitations of the technology and how we wanna fit it in.
Jenkins actually is personified, like people dress up as Jenkins. But we don’t think of Jenkins as an employee at our company. It’s like Jenkins is a tool and utility and we already have this pattern.
Dependabot is not an employee. Dependabot is utility that does work on my behalf. And so we have to use the same patterns and not get swept away in some of the sensationalized coverage that I think is just maybe meant to be a scare tactic or distract people from the bigger issues at hand.

The Bigger Bottlenecks to Solve Outside of AI Adoption

It made sense that we started with the coding task because it’s complex, it’s fun, it’s immediately gratifying. We have a ton of training data on it, and so we started there.
But any developer will tell you the bottlenecks that impact their day-to-day work, tools make up a small percentage of that. And oftentimes it’s prioritization, single-threaded prioritization, shifting priorities, things like production, debugging, everything in the outer loop, security risk, compliance. There’s so much more to being a software developer than just writing code.
AWS did a study, and this was specific to AWS employees, but they found it was like 20% of time for an AWS developer is actually spent writing code. Let’s just say best case scenario, it’s 33%. A third of time is spent writing code for a developer. That’s still only 33% of their total time. And so we have a very limited scope that we can increase.
And then we think about coding and actually code generation is smaller because we have debugging, we have all the other tasks. Code generation is just a small part of it. The optimistic view of that is, wow, there’s a huge surface area that we could apply AI tools to that will bring us more and more efficiency gains.
So we’re starting to see more tools across the SDLC. We have GitLab Duo, Atlassian Robo is a good example of this. They’ve got all your Jira tickets. They’ve got everything. And so being able to just connect everything together is a huge time saver for developers. So we have to think just beyond the coding task.
And I have strongly believed from day one that the biggest productivity gains for software engineering teams isn’t gonna come from code authoring. It’s gonna come from other stuff because it’s that stuff that is where the bottlenecks are, where we’re waiting for people, waiting for validation.
There’s a ton of opportunity out there. So it’s really optimistic, if you wanna think about it that way. Maybe you can think about it pessimistic like, oh, the tools that we’ve invested a lot in are only gonna solve a fraction of our problem. But, we’re just getting started.

DX Guide to AI-Assisted Engineering

This guide was the result of research across 180 different companies. We looked at the developers who were actually seeing good gains from AI and we just wanted to know, what are you doing?
Because there’s just so many things that you can use AI for, it’s not just code generation. And so we wanted to kind of get into the details of this.
What came out of it is the Guide to AI Assisted Engineering. This is a really tactical guide, so we cover some more general leadership stuff, but there are code examples and prompt examples. So it’s meant to hand to a developer for them to learn from.
What we found was that surprisingly, the top time saving use case wasn’t actually code generation at all. It was stack trace analysis.
I think we all just think about code generation starting from scratch, greenfield projects, say, hey, Claude, can you build me this, whatever, my recipe app, and then, 20 minutes later coming back to something that’s fully working. That’s just not really what we were seeing from developers who were saving the most time.
So, stack trace analysis is an interesting one because it’s really compressing down the time it takes you to get to the answer to move on to something. It’s a great example of using AI to just eliminate time, not kick the can down the road and reallocate the time. So, we can get suggestions as to where that error is happening, suggestions to fix it, accelerating, resolving that.
The other interesting use cases are refactoring. Migration is one that is probably top two use case at every company that I talked to lately. Tech modernization has been slow. It’s expensive. It’s not particularly enjoyable. It’s not necessarily the work that’s gonna get you promoted. It’s kind of just the cost of doing business. We need to do it. And because of that, it’s been neglected.
But now, when we have those legacy components and systems, it makes it really difficult to innovate at pace. And so organizations are looking to AI to find ways to accelerate the modernization. And that’s a really useful use case for AI because it has such clear boundaries.
We know what the old state is, and then we can give it some examples of what the new state is and figure out how to get from A to B. So things like refactoring are great use cases as well.
Code generation actually didn’t come until use case three in the study we did. So stack trace analysis and then refactoring were the more time saving than writing code, for the reasons that we have to still read it, have to still figure out if this makes sense. We’re not necessarily saving time.
It’s meant to be delivered to everyone in your organization, from your executives to the engineer trying to figure out, what do I do now? How can I actually use these tools?
I’ve just been using it for code generation. Doesn’t seem that great, but did you know prompt engineering? Or have you heard about adversarial engineering? Or making a great system prompt? Those things can really make a huge difference when it comes to efficiency gains.

Being Deliberate for a Successful AI Rollout

If there’s one thing that I would urge people to pay attention to, it is the deliberateness of the AI rollout. There’s a lot of curiosity, natural optimism, enthusiasm about AI.
And some companies are saying, okay, let’s ride the wave. Let’s see what these tools can do. And that is actually not leading to outstanding results. What does lead to much better results are companies that say, we have this very specific business problem and let’s figure out the way to use AI to help solve it.
For example, we have to modernize 70% of our code, and let’s see how can AI help us accelerate? When we think about this as a very targeted science experiment, we get much better results.
Don’t get me wrong, experimentation is a really important thing and we should be spending time experimenting. But as a general AI strategy, that’s not necessarily a winning one. We need to be more disciplined.
Think about running a proof of concept or a trial with a tool with very specific outcomes in mind, very specific measurements that you’re trying to track. What is the before and after? Is this tool actually helping us?
The more discipline that organizations have when rolling out the tool, the better and the more long term that change is gonna be, versus just hoping that individuals with a license are gonna figure it out.

3 Tech Lead Wisdom

Data beats type every time.
- There’s a lot of hype out there about AI and it can be really difficult to navigate that disappointment gap that we’re in.
- Get the data. Think really deliberately about why you’re doing what you’re doing. Make sure you have the measurements in place. Data will beat hype. It lets you tell the story of reality versus getting caught in the headlines.
AI just doesn’t change the physics of what makes good software good.
- It’s not a magic bullet. We still need to solve customer problems. We still need to have reliable software that’s easy to change. None of those things are changing just because AI is there.
- And so you have to pay attention to those second order consequence. There are second order outcomes from introducing AI into your development workflows. You can’t ignore them and only think about things like acceptance rate or even user adoption. You have to look at the actual impact on software. And that stuff hasn’t changed.
- Use Core 4, whatever framework that you have to measure development, team productivity, organizational performance, you’ll be in a much better spot.
Protect your energy.
- Having gone through many hype cycles before, containers, Kubernetes, think about JavaScript frameworks back in the early 2010s, AI feels somehow more exhausting, even for me, and this is my job. I spend day in and day out looking at these tools.
- It can be so tempting to feel like you need to read everything and try every tool, and then you feel like you’re already behind because things are changing hourly, daily. Protect your energy. Everything’s still gonna be here tomorrow. You don’t need to look at everything.
- It’s also hard because things are so volatile right now, what to actually pay attention to. So make sure that you’re not burning yourself out, trying to chase after every single shimmering, glittering thing in the AI world. It’s okay to say, I’ll see if that’s still around next week if I care about it. And then maybe I’ll give it a kick the tires a little bit.

Transcript

[00:01:42] Introduction

Henry Suryawirawan: Hey, guys. Welcome back to another new episode of the Tech Lead Journal podcast. Today, I have with me, back for the second time, Laura Tacho. She’s the CTO of DX, getdx.com, a company that is well known for, you know, advocating developer experience. So the reason I have Laura back is because DX has just done their research about the AI adoption, how to measure a successful AI adoption. I think it’s a pretty interesting topic altogether, and I’m very happy to have you back, uh, again, Laura.

Laura Tacho: I am so happy to be here, Henry. Thanks for inviting me back.

[00:02:32] Latest DX Research on AI Adoption

Henry Suryawirawan: Yeah, Laura, so I think maybe let’s start in the beginning by telling us a little bit more about your recent, maybe involvement in this AI space, right? What DX is trying to do, what kind of, you know, research that, uh, you guys are doing?

Laura Tacho: Yeah. At DX, we want to help improve engineering efficiency across engineering organizations. And so we’ve been focused on researching what those drivers are that are helping engineers get better work done, enjoy their jobs more, bring organizational efficiency to the business. And of course, you know, AI, I don’t need to tell you or the audience twice, has been kind of a game changer in the last two and a half, three years. So, um, over the last year, especially, we’ve been really trying to understand how our companies gaining efficiency from using AI tools? How are they not gaining efficiency from using AI tools? How should you measure the impact of them anyway? Because there’s so many different ways to measure developer productivity. It was almost like, you know, we had had a hard time figuring out how to measure productivity and we’re just sort of figuring out how to answer that question, and then AI came into the scene and sort of changed the game. So it’s been, um, really an accelerated education journey for a lot of us in the industry, but really exciting time.

[00:03:54] AI Role on Developer Experience

Henry Suryawirawan: Yeah, so I think if I remember like two, three years back, right, when I started following DX, right? You guys were starting quite early back then about developer experience trying to advocate that. AI was not in the picture, right? It was not even in the like core driver or core, you know, things that, uh, drive developer experience. And I think the last two, three years, definitely, AI has maybe started to become much more prominent. So tell us how much role actually AI plays, especially in the space of developer experience. Or do you see it starting to become like one of the even more prominent, uh, drivers? Or do you still think it’s too early for us to decide?

Laura Tacho: That’s such an interesting framing of the question, because I think you’re getting at one of the kind of gaps of understanding, or one of the core questions that we’re trying to answer, which is, is AI, in and of itself, a driver of developer experience? Or does AI just lift up every other driver of developer experience? And I think depending on your own attitude toward the tooling, what you’re reading in the media, you’re gonna see kind of different opinions or shapes of opinions around this. Because there’s certainly some camps of people that will say, AI is a magic bullet, it’s a paradigm shift, it’s changing everything, everything is new. And then on the other side, we have, nothing is new, it’s the same old world, AI is a dev tool just like any other dev tool, and it’s going to work by improving developer experience. We apply it across the SDLC and we see lift the same way that, you know, we might apply other forms of automation or other dev tools. And so this, we’re kind of caught in between. The world is completely brand new and the world is exactly the same. And there’s quite honestly still a lot of unanswered questions, even though we’ve been able to find some very interesting patterns in the research so far.

[00:05:43] The Current AI Adoption Rate in the Industry

Henry Suryawirawan: Yeah, I’m sure, we gotta talk a lot about those things, right? But maybe the first thing is about adoption, right? Uh, maybe in your parts of the world might be different than my part of the world, but at least, uh, in, you know, the US or the customers that you have been dealing with, how much adoption have your customers actually adopted AI? Is it like go 100% or is it still slowly? Uh, is there any difference between small, smaller organizations versus bigger organizations?

Laura Tacho: Yeah, so much to say about here. Um, I think that surprisingly, even though AI has so much natural curiosity and enthusiasm from individual developers as well as from executives and company-wide and from the industry, what we’ve seen is that companies that just sort of open the flood gates and give, you know, everyone a license, for example, you know, Copilot is turned on. They’re not getting a hundred percent adoption right away. Because just like any other tool, we need training and support, uh, in order for developers to understand what are the meaningful ways I can use this tool. You know, am I allowed to use this tool? I have a license to it, but am I violating other licenses or copyright? Am I gonna be penalized in performance review because I rely too much on assisted tooling and I’m not authoring enough by hand. You know, there’s a lot of questions from the developers and a lot of questions that can be answered from the executive team.

So surprisingly, I think organizations are seeing that adoption is a little bit slower than they would have anticipated with so much industry buzz around it. We just did a study of 38,880 engineers, I think is, um, is the total. And what we found was that the median adoption is like 50% [in July] of developers at companies. And so this is research from, of course, DX’s customers, but also wider, um, kind of across industry research. So we have a bit of a bigger sample size there. And when you think about what you read in the media with all of these companies touting their outrageously impressive efficiency gains with AI, the reality is that the median adoption rate is 50% [in July] and the top quartile, so P75, top 25% is only about 62% of developers using it on a weekly or daily basis.

So I have seen extremely few companies. In fact, I don’t think I’ve ever seen a company that has a hundred percent adoption, um, except maybe DX, we’re very heavy users. But I think this is also, um, speaking to your question, usually the larger the company gets, the more challenging it is to get widespread adoption. There’s just more types of engineers, more types of systems and components, more procurement, legal risk, compliance hurdles to go through. So as the complexity gets bigger, adoption tends to slow down.

A last thing that I’ll share with you is that I have seen an interesting pattern that very big financial companies or highly related industries are actually seeing a little bit more adoption because they have to be so structured about things. And this is a scenario where, like the saying of, slow is smooth and smooth is fast. Companies that have taken a really deliberate approach to rolling out AI have thought through training, enablement, risk, compliance, asking all the questions, answering all the questions, really supporting their engineers and treating this as an organization wide problem and not just a tool for an individual developer to use. They’re seeing the best, uh, adoption that’s actually sticking around. And I think that’s the core of it. This adoption question, it is a question of organizational readiness, not of individual readiness necessarily. And I think that’s what a lot of organizations perhaps got wrong at the beginning, or what we’ve all learned together as an industry is that if we want organizational results, we need to treat it like an organizational problem.

[00:09:27] The Leader’s Challenges Against Al Hype

Henry Suryawirawan: Wow! I think that’s a very good insight, right? So treating it as an organizational problem, organizational readiness, right? And the challenge for leaders these days, right? I’m sure the media doesn’t help, um, much as well, right? Because we can see in the news, people are saying, oh, AI is increasing productivity sometimes by a lot, right? Vibe coding, you know, anyone can start building an app just by, you know, vibe coding. And there are a lot of advancements in, you know, AI tools as well. I mean, it started by, you know, using ChatGPT, maybe you can ask and get answers. But now, you have this agentic model, you can even have asynchronous, uh, way of working, right? You just submit to an agent and somehow they come back with a PR. So I think a lot of advancements here. What do you think are some of the challenges that leaders are facing, you know, with all this media advancements and people are also confused probably. What should they do?

Laura Tacho: A lot of engineering leaders right now are stuck in between the expectations that are put out by, you know, ad supported media, I will kind of generalize it to say that way. Ad supported media that relies on sensational headlines. And then the reality of what they’re seeing in their organization. And there’s a big disappointment gap, because executives are disappointed. They look and say, hey, we’re investing all of this money in AI, it’s not getting cheaper. Where’s the ROI? Why aren’t we writing 30% of our code, uh, with AI tools? And then developers who maybe are interested in it and maybe have been skeptical because we’re just a skeptical group, I think generally developers, um, you know, we see a tool and it’s been promised the world and we try it maybe once or twice. And even though as you said, there’s all these really cool workflows and things are changing like hourly, it seems the reality is that any model is still not capable of doing a hundred percent of the work a hundred percent correctly. It’s just not there.

And I think if you could check, um, SWE-bench, S-W-E-bench, um, which is benchmarking the models against, you know, a set of tasks and seeing how much of them can actually complete it accurately. Maybe we see 70% like on a really exceptional case, but like, we’re not even close to widespread kind of reliability or feasibility even at a, in an enterprise or business setting. It’s one thing to sort of vibe code my, like, grocery app that’s gonna remember what I have in my fridge so that I can make a meal. It’s a really different thing to, uh, you know, be dealing with an enterprise, uh, feature with PII, and, you know, there’s all the horror stories that you’ll read all over the place. So I think that’s been definitely a personal challenge for engineering leaders, is being stuck in that disappointment gap where there’s just pressure from all sides.

And unfortunately, whether we like it or not, it is the role and responsibility of engineering leaders of, you know, engineering managers, VPs, and CTOs, we have to also educate our ’non-technical stakeholders’. So it’s on us. Our responsibility is on us to help decompose those headlines, help people understand what’s realistic, what the real capabilities are of these tools. And that can feel really unfair sometimes. And it just takes time away from, you know, all the million other things that we have to do. But it’s such an important part of the job because when we’re stuck in this hype cycle, kind of no one wins. And truth be told, the way to beat the hype is with data.

Henry Suryawirawan: Yeah, I like, uh, you mentioned about disappointment gap. So I’m sure many people have various different expectations and experiences as well when they use AI. Sometimes they imagine like it could, you know, solve the problem perfectly, right, with the perfect design, building enterprise software. But I think the gap is really there, right? So personally I have, I have used it several times as well during work. Uh, there are times where it, when it wowed me, like, wow, I, I can’t imagine, how could they think about it that way? Um, but sometimes also, yeah, they can come up with some, you know, even not working code, right? Something that doesn’t compile. So I think the gap can be various, uh, depending on your situation, right?

[00:13:22] Measuring AI Adoption ROI Using Acceptance Rate

Henry Suryawirawan: So, but one thing for sure when leaders try to adopt AI, you mentioned about ROI, right? How to compute ROI? And I think, the storyline here is that we all started with a simple metric, and that is acceptance rate, right? So the percentage of code that we accept, from the AI suggesting to us. So tell us a little bit about this history thing, like how come it becomes the first metric that people adopt?

Laura Tacho: You know, this class of tools, AI assisted engineering tools, in general, has been going from infancy to like toddlerhood and maturing sort of extremely rapidly. This is on an extremely accelerated timeline where I said, you know, it’s almost hourly things are changing. And one of the shortfalls of these tools, thinking back, you know, I mean ChatGPT is now almost three years old. And a lot of these other tools, you can think like, they’re not even three years old yet. They’re like, not, some of ’em aren’t even two years old. Some of them aren’t even two months old. But one of the deficits of those tools early on was that they didn’t have great telemetry. Or we also as an industry didn’t understand what even matters. What is it about these tools that’s moving the needle for the business? And so there was just a black hole of visibility into their efficacy, their utility, their impact.

And so acceptance rate was one that was latched onto early, because it is fairly straightforward to measure. It’s available. Um, you know, from a lot of tools, they can give you information about what they’re suggesting and then see how much acceptance it’s accepted. This is a, actually a good signal to understand the maturity of the tool and its fitness for particular business use cases. So we can think of a scenario where no suggestions have been accepted, then we know the model is just giving us garbage. Or the developer saying like, I’m not gonna accept this, you know. We can then think about, okay, well, what if a scenario is that it’s 50% acceptance rate or 75% acceptance rate? Now we have evidence that, hey, this tool is actually producing something that’s worthwhile and it’s helpful to the developer, and that’s gonna bring gains, right? Reduce cognitive load, you know, maybe experimentation, innovation. These things are accelerated because of that.

But where acceptance rate kind of ends is right there. It’s a good signal to tell us is the tool providing useful suggestions, but it doesn’t tell us at all about whether that code made it to production, if it was tweaked or edited by the, you know, human, um, before getting into a customer facing environment. It doesn’t tell us if it gets to a customer facing environment. And it surely doesn’t tell us anything about maintainability, quality, business impact, any of those things that are so essential. And not because in and of itself, acceptance rate is a bad metric. It’s just like, as an industry, we didn’t have the mechanisms to measure. We didn’t even know what was important.

So now I think about acceptance rate as sort of like, it’s kind of like a measurement from a different time. It was really important when these tools were in their infancy and we were trying to figure out are they giving us useful suggestions. But the thing is like, it’s so easy to measure that a lot of companies just haven’t changed away from measuring that, and now they’re getting this signal, which is, in a limited, appropriate context, useful signal, when applied as a general measurement of whether AI is having impact, very, very poor signal, and actually misleading to the point where it can be, you know, leading to some bad decisions with big money. And that’s definitely what we don’t want.

Henry Suryawirawan: Yeah, I also feel that maybe, acceptance rate, right in the beginning it was easy, right? Because it was more like a chat based, you know, question and answer kind of model, right? Where, uh, the AI will give you an answer and then you accept maybe by a tap or something like that. But now these days people are working in an agentic mode where you kind of like iterate and change over the time, especially when you do back and forth question and answer, right? So I think it becomes much, much blurry in that situation. And I think, yeah, maybe acceptance rate, like what you said is a good signal in the very first beginning, maybe whether it fits the problem, the context that you’re working with, and whether AI can suggest something that is useful, right? And I think over the time, it can become a much, a bad metric, so to say.

[00:17:39] The DX AI Measurement Framework™

Henry Suryawirawan: So which brings us to, uh, like the AI measurement framework that DX is working on. So I know that you guys kind of like just did this research and want to promote a framework on how we can measure the impact of AI adoption. So let’s start here. Tell us about this framework.

Laura Tacho: Yeah, I’ll start with a story of sort of how this came to be, which is that, you know, we’re very research focused at DX. We are working with, you know, research. We have a research team. We’re working with folks from DORA. Nicole Forsgren’s on our research team. The SPACE framework co-author, Dr. Margaret-Anne Storey’s on our research team. So that’s sort of in our company DNA, so to speak. And whenever we make a recommendation to our customers or out to the industry in general, ‘cause we do believe that this research should be open and free for other people who are not DX customers to, uh, to be able to use, and the AI framework definitely is. But we wanna make sure that we have evidence to show that it is useful.

And there was a huge deficit of this in the AI world. So as we just talked about, we had acceptance rate. Then we’re starting to get a little bit more telemetry and visibility as these teams mature, but there still wasn’t the answer to the question, like, what actually is important when making a business decision? And so over the last year, we had been partnering with companies over, you know, in the process of their own AI rollouts. And through that we’re really understanding, okay, what are the levers? What are the things to actually pay attention to to make sure that you’re getting the ROI that you expect and setting yourself up for success with these AI tools.

This framework is called the AI Measurement Framework. Uh, if you go to getdx.com, you’ll see a little banner, um, to read the white paper, look in that research tab dropdown. It covers AI measurements across three important dimensions. First is utilization, then we have impact, and cost. Just like any other framework that has multiple dimensions, so like the DX Core 4, SPACE framework, for example, it’s so important to look at all of these together as a collective unit of measurement and not get hyper fixated on one or the other. That’s where those bad patterns, like looking only acceptance rate come from.

So the reason that we have it structured this way, just to provide a little bit of kind of behind the curtain, uh, view, is that it does mirror a lot of the journey of AI in a lot of companies. So first, we do actually wanna focus on adoption and utilization. We have found very, you know, significant in some cases gains from AI. Like we know this tool is, we know these tools are things that developers enjoy using and that do bring time savings and other efficiency gains. What we found in our research is that getting users from not using it at all to using it consistently, even it’s, if it’s periodically once a month, there’s a lot of time savings to gain there. And so adoption is a big piece of the story because we want people to be, you know, on the ramp. We want them to be using the tool so that they can continue to grow and continue to see efficiency.

But also as an organization, if you’re investing in licenses or, you know, you’re investing a lot in these tools, you actually want your users to be using them just like any other tool. If you were to build, you know, if you were to buy a CI/CD tool, you would be tracking how many projects are being automated with this particular tool. And so this is sort of that similar pattern. So we wanna get people adopting the tool because that’s when we can actually have a sufficiently big sample size to see what are the use cases that are really useful in our organization.

And so once we have people using the tool, then it’s a question of impact, right? We want them to be using it, but also directing that usage in a way that’s gonna benefit the organization, the team, the individual. So for impact, the primary metric that we recommend measuring is time saved per developer. It’s not by any means a perfect metric and certainly not the only one that you should be looking at. But in terms of ease of collection and reliability, it is the one that we do recommend, of course, when paired with the other metrics in the framework. So, uh, we wanna see that developers are using AI and actually saving time because that time can then be reinvested into innovation. It can be reinvested into improving DevEx. That’s where the efficiency gains are coming from, and that’s been pretty widely standardized, kind of standardized across the industry. Um, Google had a paper recently about their 10% productivity gain. They’re using time savings per developer per week, as well as their metric. So we feel pretty good about that recommendation.

We also wanna look at things like developer satisfaction, and then this is where looking at those really solid metrics of development team or development organization performance. Like the DX Core 4, that’s where this comes in. Because AI does not operate in a vacuum. And so we have to be looking at things like speed, quality, developer experience, business impact, and see how AI impacts those. And so that’s the impact column.

And then after that we get into cost. And this is just making sure that you’re spending the right amount, not too little, not too much. What are some of the, you know, more specific ROI calculations to make sure that the financial story also lines up? Because at the end of the day, this isn’t just using AI for the sake of AI. This is a business. It has to be a business decision. And if you’re gonna go and ask for your company to open their wallet to buy more, uh, you know, token credits and other things, we need to have a really solid financial story about why this is beneficial to the business. And that’s what that cost dimension will do for you.

[00:23:05] AI Measurement Framework: Utility Dimension

Henry Suryawirawan: Right. So thanks for giving a, you know, high level overview of these three dimensions, right? Utility, that’s the first thing, then impact and then the cost, right? Maybe we’ll dive deep one by one each, right? So let’s start with the utility, right? So I assume that utility here means like everyone is starting using AI and I assume here the research is only referring to AI coding assistant and agents, not like AI, you know, like deep research tool or you know, like all these chat generic chat tools. Is that correct? That’s the first thing.

Laura Tacho: Yeah, I would say yes and no. Um, and that, that’s a really good question about how to apply the framework, because we wrote this in mind or we wrote this framework for AI assisted engineering workflows. And that can cover a couple different modalities. That can cover the chat modality. It can also cover the, you know, auto complete in the IDE as well as agentic workflows. So depending on your context, which tools you’re evaluating, which tools that you wanna, you know, have visibility into, it’s reasonable to include architecture planning with ChatGPT as part of this adoption. You know, if you’re paying for a ChatGPT Pro licenses across your engineering team, you definitely wanna capture that as well. So I think, you know, this is designed of course for software engineering, but I think we could genericize it. And I would be willing to bet that we could use it effectively in other business functions as well. It’s like it’s meant to be sort of based on principles that are transferable across domains.

Henry Suryawirawan: Right. Thanks for that clarification, because now it makes more sense, right? Because the pervasiveness of AI tools, right? You can use it literally for anything in your work these days, right? It could be for improving your writing, suggesting, brainstorming, whatever that is, right? And especially for coding, we now have our lovely IDEs, you know, with all auto completion that is supercharged by AI suggestions. I think, definitely, it’s uh, it’s like a very exciting times for developers especially, right? Especially when you have so many things that you want to give a try. You can use AI to actually give you suggestions at least.

And utilization here I assume that, you know, like people start using AI and taking some of the suggestions and improving the work that they’re doing, right? So I think that’s kind of like the utilization. And it could be about the amount of code that you accept. It could be the PR you mentioned in your example, in the website, percentage of PRs that are assisted by AI. So I think definitely it makes sense for us, right?

Laura Tacho: Yeah. One other thing I wanna mention about utilization here is that we have, we have in this framework percentage of PRs that are AI assisted and also percentage of code, committed code that has been authored by AI. And, you know, percentage of PRs that are AI assisted is something we can use this technique called experience sampling to gather survey based, self-reported pretty easily. A PR is open. You can ask the author, hey, did you use AI to author this PR? How much time did you save? And that’s where we’re getting a lot of the data, um, generally across the industry, is from the self-reported kind of close in proximity of time to the task. That’s where we get the most accurate.

When it comes to percentage of code that is committed, that’s been authored by a, that AI, that has been a problem that has been very, very difficult to solve. And it’s actually quite important, again, not a useful metric on its own. But when we combine it with adoption, when we combine it with code quality, it helps understand the surface area and the actual kind of footprint of AI’s contribution and how people are using it. And this was a problem that DX, we wanted to solve. Um, and we mobilized and we mobilized very quickly.

So we’re really happy to announce, it will be announced by the time that this podcast goes out, but we have a new AI code metrics tool that can help organizations understand how much of their code in commit. Like it will help organizations understand how much of their committed code has been authored by AI down to the commit level. And this works across all IDEs and all different modalities. So it’s on the file system level looking at changes at that level. So this isn’t an estimation. It’s not looking at acceptance rate, it’s looking truly at committed code that is written by AI. And again, in and of itself, that metric is not the most useful thing. But in combination with other metrics and helping organizations understand the total footprint of AI generated code in their organization, the downstream effects, it’s really, really valuable. And so we’re, um, yeah, we’re really excited about this new tool. We’ve gotten some outstanding feedback already from our customers that have been trialing it. So if you’re interested in seeing what that looks like, if that’s a problem you’re also trying to figure out how much of committed code is actually written by AI, you can go to getdx.com and take a look.

[00:27:51] DX AI Code Metrics

Henry Suryawirawan: Right. Very exciting because AI code kind of like metrics, right? How much portion of your code being assisted or maybe being generated by AI is something that we cannot easily get, right? Because as and when you do this question and answer, you apply, you change, you know, it gets blended really fast. And sometimes it’s very difficult to actually quantify. So thanks for the plug. So I’m sure people would be interested in, you know, using this AI code metrics. Is this something like an open source or like a plugin that you just install to your IDE and how does it work? Maybe a little bit there.

Laura Tacho: Yeah. It’s not an IDE plugin, ‘cause we found that those are pretty, they can kind of slow things down and that’s like the last thing we want. It works on the file system level. It’s like a daemon that runs in the background and then it’s syncing with, you know, an upstream server to get the data into a format then that can be kind of queried and understood. So it works across, as I said, like it works across all IDEs and all different kinds of tools. So if you’re using like Claude Code and Cursor and you know. Or you’re using an IDE that has an IDE plugin but also using something else in your terminal, it can aggregate all of those things down to the commit level, which has just been a huge question mark for our industry so far.

Because when we, when you hear in the news, 30% of Microsoft’s code is, you know, what did he say written by software. Or Anthropic’s CTO saying in six to nine months can be 90% of code written by AI. Usually what that’s talking about is acceptance rate. It’s not actually talking about code that reaches a customer facing environment. And there’s a huge difference there. But it’s just been like a, it’s just been a big question mark for our industry. So finally having some clarity there, you know, is it that engineers just enjoy coding with the tools but actually they modify everything and none of that code actually makes it to production? I mean, that’s a really serious question and a question that we need the answer to, ‘cause that really changes our approach, our AI strategy.

There’s a really big difference between, you know, rolling out a tool to your whole entire organization and then seeing that only 5% of code reaching production is AI authored, that sends a really different signal than finding out maybe it is 30% of code. Those are really different environments to live in and that are gonna require different amounts of investment and training and, you know, making sure your pipelines are ready. Are your SRE procedures ready? Those are really important questions that organizations need the answer to. And it’s just been really hard so far.

Henry Suryawirawan: Yeah. Sometimes I’m quite curious when the high percentage is being touted in the news, right? Is it really real? Because from my experience, I mean, yes, AI can help us to generate the first draft of the code, but definitely you still need humans to kind of like curate and design it much, much better, right?

[00:30:31] AI Measurement Framework: Impact Dimension

Henry Suryawirawan: Which brings us to the second dimension impact. To me, I find it quite difficult to actually quantify impact. I mean, you said about time saved per developer. But at the same time, even when we use AI, right, we prompt and we read the answers, uh, and then we try to tweak the prompts and things like that. In some cases, yes, it can save quite a lot of time. But in some cases, I’m quite doubtful whether it actually save time as well. I think one of the recent newsletter from Pragmatic Engineer, even I read that some people said using Cursor, it doesn’t actually mean you are much more effective. So maybe, tell us from your research here the impact. How can people actually quantify the impact much, much more accurately?

Laura Tacho: So we, in our study that we did of 38,000 plus engineers, we found the median time savings right now to be about three hours and 45 minutes for people who are using daily and weekly. And that doesn’t match up to these kind of sensationalized claims. This is a very, you know, it’s more in line with that 10% time savings. And you’re right. You’re absolutely right, Henry, and I think your skepticism is not misplaced. It’s very welcome. Because in some particular use cases, all we’re doing is shifting where we’re spending our time. We’re not actually reclaiming time, as you said. We’re still prompting, we’re still fiddling. A lot of this time, we’re also trying to learn the tool while also completing a task. And so we’re doing two things at once, which slows us down as well. We might be moving from our old trustee IDE to Cursor, and then we have a whole learning curve there. So, you know, there’s lots of things that can make it not a total net positive or time savings gain.

And that’s actually, you know, part of why that this metric is there is that when used as a signal in combination with the other things that can give us insight into like, what are these tools actually doing? Because if we find that while quality is going up, maintainability is going up, speed is going up, developer experience is going up, but time savings is staying the same, some companies are gonna be okay with that and some companies are gonna say, okay, we gotta find a different tool. And so it’s all about fitting the organizational context that you’re operating in. Um, but you’re absolutely right to call out that in a lot of cases for, you know, use cases that we’re just shifting where the human spends their time downstream a little bit instead of it eliminating cycles from getting that task done.

[00:32:57] The Importance of Measuring Productivity Holistically

Henry Suryawirawan: Yeah. Which brings us to the thing that you mentioned, right? You should not take like impact, time saved for developers, just by itself, right? You must treat holistically. Maybe with DX Core 4, right? Uh, I always love the way DX frame this, right? So there’s this perceived, the sentiments, versus the, you know, like the hard metrics, right? Maybe lines of code or maybe, uh, I dunno, PRs raised or whatever that is, right? And I think even with AI, the same thing, um, is true, right? You still need to get the sentiments from developers, which is like whether they’re happy, whether they feel more productive. And maybe also downstream whether there are more issues like bug fixing or operational issues. So tell us how much more important that we actually should quantify this with other dimensions or other drivers in, like, for example, the DX Core 4?

Laura Tacho: I think companies that hadn’t unified on a definition of performance are really feeling the pain when it comes to quantifying or measuring the impact of AI because they’re starting with nothing. They’re starting with a big question mark of what does performance actually mean. And then when they add AI, it’s really difficult to tell. Is this good? Is this bad? Are we moving up or are we moving down? Companies that have really invested in developer experience and had a really clear framework, story around how we define engineering excellence, engineering efficiency.

You know, Core 4 is a great example of how to do that. We’re putting together DORA, SPACE, DevEx, into a simple framework that’s easy to deploy at companies. But you know, using Core 4, you have your baseline metrics. And then you can see, okay, we’ve added in AI now. Are we actually getting faster? I can look at my speed dimension. Are we improving developer experience? Uh, are we improving quality? Is quality declining? Those are important questions. And, you know, the business impact dimension, which is are we spending more time on innovation versus maintenance? That’s a big one that a lot of companies are expecting to move with AI. They’re expecting toil some work to be going away, maintenance and operational work to be going away so that people can spend more time on innovation. But is that really happening? And without those measurements to start with a baseline, it’s really difficult to know. But even if you don’t have a baseline, it’s a huge misstep to not be looking at those things because we need to look at the second order consequences of tools and not just the direct surface area that the tool, you know, the tool is having on your organization like, you know, adoption or time savings. We need to look at the health of the organization and the efficiency overall.

Henry Suryawirawan: Yeah, I think getting the baseline is really super critical, right? Especially if you wanna, kind of like quantify the impact before and after, right? And I think one thing that I always love from the DX, uh, way of, you know, doing this right? You can always start with survey, right? You don’t need hard metrics. You don’t need like numbers to actually quantify impact. You can even just ask developers whether they feel happier, much more productive or things like that. So I think the DX Core 4 becomes much more important, uh, in this sense, right?

[00:35:54] AI Measurement Framework: Cost Dimension

Henry Suryawirawan: So let’s go to the cost. I think this is a little bit tricky to discuss, right? I mean, costs, many people think by adopting AI they can just reduce costs a lot. First, they might need lesser people. And that’s why in some companies or industries, people are getting laid off, uh, unfortunately. The other thing is less juniors needed, right? So maybe hiring is, uh, being frozen and you don’t add more people. You just ask people to utilize AI more. So from, maybe from your experience here, cost as a dimension, how should people treat it? Is it really true that every time you adopt AI you will have a lot of cost savings?

Laura Tacho: I don’t think that’s true yet. I think the, even the inclusion of cost as a a top level dimension, you know, I did think about it, because I think whenever you mention something like cost, our automatic reaction is to reduce cost. I think that is how people are hardwired to think. And this is not about reducing cost, but this is about making sure that you’re spending the right amount. So we don’t also don’t wanna spend too little. And especially when it comes to training and enablement, we don’t wanna be spending too little. Because there’s times that spending too little will actually bottleneck the amount of positive gain that you can have.

So for example, if we know that the tool is really good, but you only provide licenses for half of your engineering staff because you don’t wanna invest in it, well, okay, that’s kind of a, we can do the math problem and think about is that a good decision or not. But also, let’s say we, you know, over, we invest in a hundred percent licenses for a hundred percent of our engineers, but we don’t invest in training and enablement. Well, that’s probably not gonna be as impactful as investing in 65% of your engineers get licenses and doing really concerted training and enablement and then scaling it up over time. That’s probably gonna lead to better results. And so it’s about allocation, making sure that this, the investment is the right amount.

Henry Suryawirawan: Yeah, so not too little, not too much, right? So too little, I think, in some companies I heard like, organizations just are not willing to give license for everyone. They just selectively choose some developers to get this license. In some other companies, right? Uh, when developers feel AI is helping them a lot, they want to have more tokens, so to speak, right? Because the credits get used up pretty-fast. So they don’t also give that, so that developer has to wait for another cycle, maybe next month before they can get the credits replenished again, so to speak, right? So I think definitely invest in the right amount and looking at the other dimensions as well, because you can’t just see it in one dimension, like what you said. I think holistically it should give a much better framework for AI adoption success, right? Thanks for sharing this framework. I think, uh, we can have a better picture on how we can measure the adoption and the successful rate.

[00:38:34] AI Second Order Impact on Software Quality and Maintainability

Henry Suryawirawan: So maybe let’s go to some of the impact of adopting AI, you know, tools, right? Specifically coding assistants and agents, right? I like in one of your articles somewhere that you mentioned that maybe people are now fixated a lot more on the productivity and speed, right? But actually that’s like a second order effect, right, impact about when using these AI coding tools. First, it’s about software quality. And second is about maintainability. So tell us a little bit more on these two aspects, why people should be more critical about those two.

Laura Tacho: This is a set of tools that’s very new, again, like toddlerhood. And so one of the downsides of that. I mean, it’s very exciting and things are progressing so rapidly, but we just don’t have the longitudinal data. We don’t have data from now and five years ago to track how has this changed. And one of the big questions right now with AI is how are these codebases that are AI authored or AI assisted gonna hold up over time? Because it’s trivial to produce a ton of code right now, right? It’s not all good code and it’s not all code that’s gonna help your customers. And it’s not all maintainable code.

And so, you know, some of these parts of the problem are a bit of a paradox. So, um, you know, I’ve had this discussion a lot about is AI actually like a paradigm shift in programming in general, like a lateral move from expressing what we want to happen in terms of business logic in Ruby or JavaScript to just expressing it in English. And having that then, you know, broken down into computer instructions, like, you know, I don’t write Assembly, if I don’t have to. I had to write some, you know, I think we’ve all, maybe not everyone, like I’ve had to write Assembly before. Um, and I don’t because we now have better tools that I don’t need to care about that. Like I don’t have to care about memory allocation and garbage collection anymore. ‘Cause like we have nice developer tools that take care of that.

And is AI just another stack on top of this that is allowing us to express what we wanna build in a different way? In the same way that, you know, Ruby and JavaScript allow us to do that in abstract away the guts of systems from us. And you know, maybe that’s the case. Uh, I think it’s a little bit too early to tell, but I think that’s an interesting thought experiment to think about what would it be like that? And so if that is the case, you know, we can think about a future where AI generated code isn’t meant to be read by humans. It’s meant to be read by other AI agents.

And so what does that mean in terms of maintainability? What does that mean for code review? This is like a, there’s a lot of questions here. So, you know, kind of the paradox is that AI might actually be producing code that’s harder to maintain or understand. But because we use AI tools, it might be easier for us to understand code that’s hard to understand. And so is it a net wash? Is it a net positive? It’s really difficult to know. These are some of like the, you know, the areas that we’re really exploring right now and trying to piece together, you know, what does excellence continue to look like in a world of AI assisted engineering. And, you know, unfortunately there’s just not a lot to go off of because it’s so new. We don’t have, you know, we don’t have five years of historical data from a single codebase to see how things are changing over time and, you know, how that impacts reliability, maintainability, cost of change, those kinds of things.

Henry Suryawirawan: Yeah, so very interesting the way you mentioned like a paradigm shift, right? Personally, I find that we are still having the mismatch, right? Like for example, we instruct AI in a human natural language, right? But we still read the code. We still expect other people to read in a code level, right? I think we always have this kind of issue, right? Because the code generated by AI is not always perfect. Sometimes there are duplicates, long procedural stuff, right? But if we shift the paradigm so that like we instruct and also we read in a natural language way, probably like what you said, right? AI agents will be able to understand, you know, whatever structure that, uh, you know, your code base is in, right? So I think that maybe can can help a lot in terms of maintainability and also maybe quality, so to speak, right?

[00:42:38] The Danger of Vibe Coding

Henry Suryawirawan: Yeah, speaking about maintainability, what I know from some research, right, like GitClear report, it’s very clear that, you know, the churn of code is pretty high, right? The amount of code that gets, you know, deleted and rewritten and all that, I think is pretty high with the usage of AI, especially if you don’t kind of like validate or check the generated code produced by AI. Especially when you do vibe coding. So I think people these love to talk about vibe coding, right? But vibe coding is only probably good for building POC, small applications where you don’t need maintainability and evolvability. But definitely there’s a clear danger if you just vibe code all the way. Yeah.

Laura Tacho: I definitely agree. I think one of the things that, um, I think about often is that I think teams that have invested in building resilient systems that are able to sort of swallow a little bit of variability are the ones that are seeing the best gains from AI so far. So, you know, teams that have really great observability or have, you know, plan for things to go wrong are the ones that are really in a situation to, like, they put themselves in the position to gain a lot because it is, as you said, like it is non-deterministic. It’s unreliable. Um, you know, just…

Um, yeah, so just one example. I was vibe coding something the other day. And you know, for those of you who we’ve never met yet, but I have a very long history in Docker and containerization. I was a Docker captain. I was working on the project from kind of like day two. And so I have a very in depth understanding of Docker. But just for fun, I kind of cloned, um, an application that wasn’t dockerized, wasn’t able to be run in a container. And I was like, I wonder what Cursor will do when I asked it to containerize this application. And I did it. I was like, I wanna do it a bunch of times because I wanna see the difference.

And one time, the Dockerfile was like, it was pretty okay. And then I got a Docker Compose YAML that had like all the port bindings and everything in it, and I was like, okay, cool. Run it via docker compose up. And I was like, okay, cool. So then I like, you know, deleted everything and I was like, let’s try again. Hoping like this is a pretty, it was like a very boilerplate Python app. Like this can’t be difficult, you know, it’s gonna gimme the same output. And I was like, really? You know, I had my fingers crossed. I was like, I’m counting on you Cursor. Let’s do this. Totally different results the next time. Then it was like, it was just a Dockerfile and then it gave me like the Docker run command with all the port bindings, like in the thing. I mean, effectively the same exact results. And it didn’t really, I mean, both were like technically correct, but this is why like the, you know, vibe coding, if you can imagine just that drift and then that drift happening across multiple people, across, you know, days and weeks and months. It is just recipe for disaster without constraints and boundaries and linting and code styles and all of those other things.

And this is why like vibe coding is good for a proof of concept. It’s nice to hold something in my hand and be like, here it works. But when it comes to actually applying a lot of these workflows at like an enterprise scale, especially in a highly regulated industry, man, there’s a lot more that is, there’s a lot more that needs to be considered than just, you know, vibe coding your way to profit, which is, um, you know, we’re a ways away, even if AI can bring some really remarkable gains to organizations, teams, and individuals. You know, we’re still far away from swarms of agents completing full feature sets for us and this is, you know, two things can be true at the same time. We can be really optimistic about what’s about to happen, but then also rooted in reality about what’s happening now.

Henry Suryawirawan: Yeah, so the probabilistic, the randomness, it’s still there, right? You can ask the same question, you can get different results, right? Sometimes it’s pretty different. Um, so definitely there’s a randomness factor that people need to factor in, right? Because you can’t just rely on AI to produce the best results ever, right? Because it’s, it’s just so difficult. With the way the AI models is built in at the moment, right? The LLM, how it works.

[00:46:31] Treating AI as Extensions of Teams

Henry Suryawirawan: So one thing very particularly interesting when I read the framework, right? The research paper that you have, right? One thing that you mentioned that we need to expand the definition of developer, and we should also treat AI agents like an extension to your team. So I think for some people this might be quite interesting to understand, like tell us a little bit more about this expansion of the definition of developer and also like treating AI as an extension to your team.

Laura Tacho: Yeah, I think, um, as you said, like prototyping, rapid prototyping, smaller bug fixes that have very clear boundaries, these are things that are no longer necessarily only the domain now of people who have a traditional software engineering background or maybe have engineer in their title. Like, it’s reasonable for a designer now to take, you know, Figma files and turn them into a working prototype and maybe even commit them somewhere. And so, you know, when we talk about expanding the definition of developer, we need to make sure that our definition and sort of surface area of where we’re looking for gains is very inclusive to include all of the people that are using it and not just people that have the word developer engineer in their title because we might be missing out on a lot of stuff.

And I do think that AI is kind of democratizes code a little bit more. You know, I think there’s a lot of people who work directly on the cusp, you know, support folks who are maybe understanding the code, verifying stuff, and like maybe now they can submit the PR, you know, instead of just verifying and providing that detailed report to an engineer. You know, it just changes a lot of workflows. So we have to make sure to look at that across the whole organization.

In terms of measuring agents and AI, you know, we can say like agentic workflows is part of the team. There’s been kind of a trend or maybe a scare tactic, I’m not sure to talk about like agents as like digital employees. And I just like wanna be clear that’s like not where we’re heading here. This is not, um… You know, it is true that, yeah, some organizations might make the decision to like slow headcount growth because they’re getting efficiency gains with AI. But that doesn’t mean that those agents are digital employees. We have, I think that’s kind of just like a, maybe a scare tactic, um, again, ad-based or ad-driven revenue, you know, to make people more afraid of it or, you know, to get people to read the, to read the article. But, we should, when we’re measuring the impact that agents are having, we have to understand like at the end of the day, it’s still a human that’s responsible for dispatching the work, creating the spec, whatever it is. And so we should be measuring these workflows as extensions of the teams that they’re operating within. Not as like their own team or some other kind of model where we’re thinking about them as like truly autonomous. Like there’s very much still humans in the loop. Even if the work can be done without a human in the loop, there still has to be accountability on a human level and we can’t forget that.

Henry Suryawirawan: Yeah, so, I’m not sure one day can a software development team be purely run by AI agents? So maybe at as of this time, I think it’s still pretty, pretty impossible, but I dunno whether the next few years, right, it could be that you can have like some teams running purely through AI agents. And I think the advancement in agents, you know, like MCP protocol, A2A protocol, right, things like that definitely allows us to have like all these digital assistance running behind. It could be like simple things that I use typically are like, you know, solve this Dependabot issues, right? Or maybe this bug fixes that is pretty small, right? You can pretty much run agents to actually do that for you. So definitely we’ll start seeing teams incorporating like this agentic developers, uh, in their team.

Laura Tacho: Yeah, yeah. Yeah, it’s also complex for the reasons that you just described because, you know, even though I’ll say make sure to measure the gains in the context of the team, it is true that the skills that developers need to be effective are gonna be changing the more we can rely on agents to complete these, you know, well-defined tasks or even more complex tasks in the future where, you know, a senior developer needs to have skills connecting their work to the business and really understand that because now they’re gonna be responsible maybe for a bigger surface area or, you know, dispatching work, getting rid of some of the toil. And so it’s kind of, and again, we’re kind of caught in that dichotomy or sort of the paradox of these aren’t digital employees, but in some ways senior engineers, um, or, you know, seasoned engineers are gonna kind of be like a team lead for maybe a swarm of agents. Like I see that pattern already emerging. And again, yeah, two things can be true even though they seem a little bit contradictory. It’s all about sort of understanding the limitations of the technology and how we wanna fit it in.

You know, I use note-taking apps like Granola, whatever. I don’t think about those as a digital employee. Even though it’s like my digital notetaker, it’s like it’s not a human. Jenkins is an example I use about this because Jenkins actually has like a, it’s personified, like people dress up as Jenkins. But we don’t think of Jenkins as an employee at our company. It’s like Jenkins is a tool and utility and we already have this pattern. So many other, you know, places, CI/CD is a great way. Dependabot is not an employee. Dependabot is utility that does work on my behalf. And so we have to kind of use the same patterns and not get swept away in some of the sensationalized coverage, um, that I think is just maybe meant to be a scare tactic or distract people from the bigger issues at hand.

Henry Suryawirawan: Although I’m sure for some developers, um, being a tech lead of a bunch of agents will be pretty cool, because you just…

Laura Tacho: I mean, it does sound cool.

Henry Suryawirawan: Yeah, you just feel like you can do a lot more, right. So yeah, definitely, um, treating these agents, AI agents, um, not yet as a, like an employee or staff. But I dunno, in the future, maybe we will start seeing different pattern of how people using it, right?

[00:52:31] The Bigger Bottlenecks to Solve Outside of AI Adoption

Henry Suryawirawan: The other thing that I wanna pick out from your research as well, right? This is quite interesting thing to discuss as well, right? When people are talking about AI, again, like productivity, speed and all that, producing more code, people think we can deliver much faster. But actually in your research you mentioned, it’s just one lever that matter, right? There are still bigger bottlenecks to solve, right? This could be the outer loop part, right? The communication, the coordination with other people. Prioritization as well, right? So tell us why we should not forget about these aspects, right? When people are much fixated into, you know, how to use AI to be more productive. But actually there are other bigger bottlenecks that we should think about as well.

Laura Tacho: I think it made sense that we started with the coding task because it’s complex, it’s fun, it’s like immediately gratifying. We have a ton of training data on it, and so we started there. But any developer will tell you the bottlenecks that impact their day-to-day work, tools make up a small percentage of that. And oftentimes it’s, you know, as you said, prioritization, single-threaded prioritization, shifting priorities, things like production, debugging, you know, as you said, everything in the outer loop, security risk, compliance. There’s so much more to being a software developer than just writing code. I think AWS did a study, and I think this was specific to AWS employees, but I think they found it was like 20% of time for an AWS developer is actually spent writing code.

And so we can think about, we only have, and let’s just say best case scenario, it’s 33%. A third of time is spent writing code for a developer. That’s still only 33% of their total time. And so we have a very limited scope that we can increase. And then we think about coding and actually code generation is like smaller because we have debugging, we have all the other tasks. So there’s um, code generation is just a small part of it, but I think the optimistic view of that is like, wow, there’s a huge surface area that we could apply AI tools to that will bring us more and more efficiency gains. So we’re starting to see more tools across the SDLC. We have like GitLab Duo, Atlassian Robo is a good example of this. They’re, they’ve got all your Jira tickets. They’ve got everything. And so being able just to like connect everything together is a huge time saver for developers.

So we have to think just beyond the coding task. And I have strongly believed from day one that the biggest productivity gains for software engineering teams isn’t gonna come from code authoring. It’s gonna come from other stuff because it’s that stuff that is where the bottlenecks are, where we’re waiting for people, waiting for validation. There’s a ton of opportunity out there. So it’s really optimistic, uh, if you wanna think about it that way. Maybe you can think about it pessimistic like, oh, you know, the tools that we’ve invested a lot in are only gonna solve a fraction of our problem. But, you know, we’re just getting started.

Henry Suryawirawan: Yeah, never forget that software engineering is like a sociotechnical problem, right? So even though we have tools to assist in the technical aspect, don’t forget about the social aspect, the human aspect, right? So all these are also important whenever you build, especially enterprise software, right? Things that generate business impact. So definitely don’t just fixate on just one technical portion, right?

[00:55:47] DX Guide to AI-Assisted Engineering

Henry Suryawirawan: So speaking about, you know, the technical part. So I know that DX also published this AI guide, uh, AI assisted Engineering guide, right, to help people to actually start using this AI assistants much more effectively. Because like sometimes for me, right? I’m also lazy in writing prompts. But definitely our prompts are not always the best, right? We are pretty lazy. We think that just one liner, you know, AI can solve the problem. But definitely there are techniques to actually use AI much more effectively, right? So tell us about this guide, uh, how can people find it and actually what can they learn from this guide?

Laura Tacho: This guide was the result of research across 180 different companies. We looked at the developers who were actually seeing good gains from AI and we just wanted to know like, what are you doing? What are you doing there? Because there’s just so many things that you can use AI for, right? It’s not just code generation. And so we wanted to kind of get into the details of this. And what we, what came out of it is the Guide to AI Assisted Engineering. This is a really tactical guide, so we cover some more general sort of leadership stuff, but there’s like code examples and prompt examples. So it’s meant to like hand to a developer for them to learn from.

What we found was that surprisingly, the top time saving use case wasn’t actually code generation at all. It was stack trace analysis. So that was really surprising to me when I read it. ‘Cause I think we all just think about code generation starting from scratch, greenfield projects, you know. Say, hey, Claude, can you, uh, build me this, you know, whatever, my recipe app, and then, you know, 20 minutes later coming back to something that’s like, you know, fully working. That’s just not really what we were seeing from developers who were saving the most time. So, you know, stack trace analysis is an interesting one because it’s really sort of compressing down the time it takes you to get to the answer to move on to something. And so that’s a, it’s a great example of using AI to just eliminate time, not kick the can down the road and reallocate the time. So, you know, we can get suggestions as to where that error is happening, suggestions to fix it, accelerating, resolving that.

The other interesting use cases are, um, like refactoring. Migration is one that is probably top two use case at every company that I talked to lately, I guess to speak in generalization. But tech modernization has been slow. It’s expensive. It’s not particularly enjoyable. It’s like not necessarily the work that’s gonna get you promoted necessarily, you know. It’s kind of just like the cost of doing business. Um, we need to do it. And because of that, it’s been neglected. But now, you know, when we have those legacy components and systems, it makes it really difficult to innovate at pace. And so organizations are looking to AI to find ways to accelerate the modernization. And that’s a really useful use case for AI because it has such clear boundaries. So we know what the old state is, and then we can give it some examples of what the new state is and figure out how to get from A to B. So things like refactoring are great use cases as well.

Code generation actually didn’t come until, um, use case three in the study we did. So stack trace analysis and then refactoring were the more time saving than writing code, um, for the reasons you mentioned before, Henry, like we just half to still read it, half to still figure out is this, does this make sense? We’re not necessarily saving time.

And uh, I think you asked how do you get it. You can go to getdx.com and look in our research dropdown on the menu. There’s the Guide to AI Assisted Engineering. It’s a PDF that you can download and read through. And again, it’s meant to sort of be delivered to everyone in your organization, from your executives to, you know, the engineer trying to figure out, okay, what do I do now? How can I actually use these tools? I’ve just been using it for code generation. Doesn’t seem that great, but like, hey, did you know prompt engineering? Or have you heard about adversarial engineering? Or making a great system prompt? Those things can really make a huge difference when it comes to efficiency gains.

Henry Suryawirawan: Yeah. Speaking from my experience as well, right? Definitely, when I got introduced to these tools, there are so many things that I don’t know the use case, right? What is AI good for? So definitely code generation we hear about it a lot, but there are so many other unique, creative ways of using AI. And I’m very happy to have these guides so that people can get trained, can get to know what are the other things that they can try using AI, especially the prompt technique, right? Uh, I think in the guide you have like, I dunno, like 10-ish, uh, prompting techniques. And those things can really uncover, uh, different results, in terms of quality, in terms of preciseness, right? Don’t forget the aspect of prompt engineering as well, because sometimes, uh, developers, right? We just, we just want everything to be done, fast, right? But, uh, actually, yeah, there are many ways to actually get it more effective.

Laura Tacho: Yeah, definitely.

[01:00:38] Being Deliberate for a Successful AI Rollout

Henry Suryawirawan: Yeah. So we have talked a lot about, you know, all these AI measurement framework, AI adoption, the pitfall of acceptance rate, and things like that. Is there anything that we missed that you think we have to cover as well?

Laura Tacho: If there’s one thing that I would urge people to pay attention to, it is the deliberateness of the AI rollout. So, you know, there’s a lot of, again, curiosity, natural optimism, enthusiasm about AI. And some companies are saying, okay, well let’s ride the wave. Let’s see what these tools can do. And that is actually not leading to outstanding results. What does lead to much better results are companies that say, we have this very specific business problem and let’s figure out the way to use AI to help solve it. So for example, we have to modernize, you know, 70% of our code, and let’s see how can AI help us accelerate? When we think about this as like a very targeted science experiment, we get much better results.

So instead of, you know, throwing a bunch of tennis balls on a Velcro wall and hoping something sticks, don’t get me wrong, experimentation is a really important thing and we should be spending time experimenting. But as a general AI strategy, that’s not necessarily a winning one. We need to be more disciplined. You know, think about running a proof of concept or a trial with a tool with very specific outcomes in mind, very specific measurements that you’re trying to track. Okay, what is the before and after? Is this tool actually helping us? The more discipline that organizations have when rolling out the tool, the better and the more long term that change is gonna be, versus just hoping that individuals with a license are gonna figure it out.

Henry Suryawirawan: Yeah, and we don’t wanna see like a 100X, a 1,000X developers, you know, with the assistance of AI. But I think, coming back to what you mentioned in the beginning, this is treated as a, like an organizational problems to solve, right? So like be deliberate, think about the business outcome that you wanna gain in terms of using AI. So I think that’s pretty good addition.

[01:02:32] 3 Tech Lead Wisdom

Henry Suryawirawan: So Laura, it’s again, really a pleasure to talk to you again and very exciting about the topic of AI. Unfortunately, we have to wrap up. But before I let you go, I have the same question I asked you last time. I call this the three technical leadership wisdom. Maybe we can spin it off a little bit, uh, with AI flavor. So maybe if you can share your version today that would be great.

Laura Tacho: So my three pieces of engineering wisdom, AI edition, would be data beats type every time. So there’s a lot of hype out there about AI and it can be really difficult to navigate that like disappointment gap that we’re in. Get the data. You know, think really deliberately about why you’re doing what you’re doing. Make sure you have the measurements in place. Data will beat hype. It lets you tell, you know, the story of reality versus getting caught in the headlines.

The second piece is that AI just doesn’t change the physics of what makes good software good. It’s not a magic bullet. We still need to solve customer problems. We still need to have reliable software that’s easy to change. None of those things are changing just because AI is there. And so you have to pay attention to those second order consequence. There are second order outcomes from introducing AI into your development workflows. You can’t ignore them and only think about things like acceptance rate or even user adoption. You have to look at the actual impact on software. And that stuff hasn’t changed. So, you know, use Core 4, whatever framework that you have to measure development, team productivity, organizational performance, you’ll be in a much better spot.

The last piece of advice is maybe a bit unconventional, but, um, you know, having gone through many hype cycles before, containers, Kubernetes, um, think about JavaScript frameworks back in like the early 2010s. AI is, it feels somehow more exhausting, even for me, and this is my job. I spend day in and day out looking at these tools. Um, you know, protect your energy. It can be so tempting to like, you know, feel like you need to read everything and try every tool, and then you feel like you’re already behind because things are changing hourly, daily. Protect your energy. Everything’s still gonna be here tomorrow. You don’t need to look at everything. You know, it’s also hard because things are so volatile right now, what to actually pay attention to. So make sure that you’re, um, not burning yourself out, trying to chase after every single shimmering, glittering thing in the AI world. And it, you know, it’s okay to say, hmm, I’ll see if that’s still around, uh, next week if I care about it. And then maybe I’ll give it a little kick the tires a little bit.

Henry Suryawirawan: Uh, that’s a very beautiful reminder. Um, I personally sometimes feel fatigue as well because it’s, it never ends, right? It’s just like so many cool things, new things being invented. And if you keep chasing those, I guess, yeah, you’ll get pretty tired and you’ll feel demoralized as well sometimes, right? And we still have a job to do as well. Um, so don’t forget about that aspect. Um, and hopefully this is just one, uh, other hype cycle that we have to go through. Although this time it seems like the hype never ends. So we’ll see where we are in the next, uh, few years. So thanks again for reminding that.

If people want to follow you, ask you about, more about this framework, about DX and all that, is there a place where they can find you online?

Laura Tacho: Yeah, come find me on LinkedIn. I think I’m the only Laura Tacho out there, so just find me. I love talking about this stuff. Shoot me a message. I’m happy to answer any questions. You know, and for the rest of 2025, 2026, I will be speaking at some conferences. So if you see me there, don’t hesitate to say hello.

Henry Suryawirawan: Yeah, definitely I recommend people to follow you on LinkedIn especially, right? So you have some great posts that I think people can learn just from those micro posts. So again, thank you so much for this time, uh, Laura. I hope people get to adopt the AI measurement framework. It’s pretty exciting time. And hopefully organization can reap the benefit from your research and make sure that they can adopt AI much more effectively.

Laura Tacho: Yep. Thanks so much, Henry.

– End –