DORA metrics don’t measure developer productivity. They measure CI/CD practice adoption.

This case study highlights the flaws of using DORA metrics as productivity measures by comparing two "Elite" teams, where one achieves 2.4x more output despite similar deployment frequency. DORA metrics, while useful for tracking processes, fail to capture true productivity and can be easily gamed. Our Stanford-developed algorithm provides a more accurate measure by analyzing the functional impact of code changes, helping teams make data-driven decisions to improve performance.

Yegor Denisov-Blanch

Content

5 mins

We illustrate this with a case study of two similar teams: both deploy multiple times daily and rank "Elite" by DORA standards, but one produces 2.4x more output (ie. has 2.4x more productivity).

When evaluated using our productivity algorithm, Team 2 shows a “top 40%” productivity level - closer to “Average” rather than “Elite”.

Even if we remove our algorithm from the equation, using deployment frequency as a measure of productivity is flawed:

1) Deployment sizes aren’t constant within & across teams. Is a team that deploys 4 times a week *always* better than one that deploys 5 times a week? How much better?

2) Deployment frequency is gameable. Once you start measuring your teams using this, they’ll be incentivized to reduce the size of each deployment. Trivial updates might become the norm. You can easily improve DORA metrics without improving your productivity.

To draw an analogy from my experience as an ex-competitive Olympic weightlifter, using DORA for productivity is like tracking gym visits without considering the weights lifted.

Are DORA metrics still useful?

- Sure, there’s value in them. But they’re not a productivity metric and shouldn’t be used as such.

Why did people start using DORA as a productivity metric?

- Historically, as teams transitioned from waterfall with quarterly deployments to agile with multiple weekly deployments, deployment frequency seemed to correlate with productivity.
- A team deploying several times a week often outperformed one deploying once a quarter.
- With the absence of a robust productivity metric in software engineering, DORA metrics gradually became a stand-in for measuring productivity.

How do we measure developer productivity in this case study?

Our algorithm measures developer productivity by analyzing the functionality (ie. what the code does) of code changes in Git commits. It considers over 30+ codebase dimensions and has been calibrated using data from millions of files in more than 10 programming languages. This approach enables us to quantify the impact of each commit and, by combining it with Git metadata, we can provide a comprehensive measure of both individual and team productivity.

About Our Mission

- We are conducting research at Stanford focused on quantifying software engineering productivity.
- Our objective is to enable engineering teams to base decisions on factual data, rather than on gut feelings and internal politics.
- Our research participants use our algorithm to make decisions about their team’s performance, outsourcing, work methods (home vs office), etc.