This week I read Code Ownership and Software Quality: A Replication Study by Michaela Greiler, Kim Herzig, Jacek Czerwonka. This study expands on prior research by further investigating the impact of ownership on code quality.
My summary of the paper
The inspiration for this study comes from prior research that also looks at the impact ownership has on software defects. Here, researchers replicated previous studies to understand whether the granularity level of the code studied (in this case, looking at the source file and directory level) would impact the results. This study also makes contributions such as expanding the ownership metrics considered and investigating the reasons for lack of ownership.
Here’s an overview of the study subjects and methodology used:
Researchers investigated a set of major Microsoft products, chosen because they represent products of different nature. Each of these systems comprises several millions lines of code.
To measure code quality, researchers counted the number of bug fixes that were linked to a code artifact, e.g. the source file or directory level. They assume the higher number of bug fixes, the lower the code quality of that artifact.
Multiple measurements were used for code ownership, including individual ownership metrics (measuring the direct involvement of engineers by measuring the number of check-ins to the codebase), and organizational ownership (looking at management hierarchies to determine ownership). Note: the organizational ownership metrics studied were weakly correlated with the number of bugs, so I have left that finding out of this summary.
Researchers computed spearman rank correlations to show basic relationship between code ownership and code quality.
Interviews were used to understand why certain files were weakly owned.
Here are the takeaways from the study:
Code ownership correlates with code quality
Ownership is negatively correlated with the number of bugs, and the more shared the file ownership the higher the likelihood that it will contain code defects. This trend is also supported by the fact that for all projects studied, the number of contributors is positively correlated with the number of bugs.
This trend was found when studying code at both the source and directory level.
The individual ownership metrics that were most strongly correlated with the number of bugs
The percentage of minor, minimal or major contributors among all contributors, and percentage of edits of the minimal contributor, have the strongest correlations (in Table IV, these are named pcminors, pcminimal, pcmajors, minownerdir) with the number of bugs. In other words, the number of bugs in a directory increases when the percentage of minor or minimal contributors increases. And the number of bugs decreases if more major contributors are among the contributors of a directory.
pcminors = Percentage of contributors among all contributors with less than 50% commits among all files in a directory: (sum of distinct minors)/(# of contributors)
pcminimal = Percentage of contributors among all contributors with less than 20% commits among all files in a directory: (sum of distinct minimals)/(# of contributors)
pcmajors = Percentage of contributors among all contributors with more than or 50% commits among all files in a directory: (sum of distinct contributors with more than 50% changes)/(# of contributors)
minonerdir = Percentage of commits of the lowest contributor considering all files in a directory: (sum of all commits of the lowest contributor)/(# all commits)
The reasons for lack of ownership
Weakly owned files, or files that miss a strong owner, have on average 6 times more bugs assigned as files that do have a strong owner.
There are several reasons why weak ownership was found to occur:
transfer of ownership,
bug fixing,
refactoring, or
architectural issues or smells.
Researchers also note that there is a distinction between intentional and unintentional weak ownership. Only with the latter did engineers want to gather more information about why there was weak ownership and potentially change the ownership model.
Recommendations
Using the lessons learned from the study, researchers concluded the paper with some recommendations for engineering teams:
1. Review weakly owned files and directories to understand the mechanisms and dynamics at play (i.e., collaborative ownership or non-ownership).
“[Martin Nordberg] makes clear there is a difference between ‘collaborative’ ownership and ‘non-ownership’. He defines collaborative ownership, as an ownership where code is collectively owned, but responsibilities and schedules are clear… he describes non-ownership as a mode in which several developers make changes to the same system but with minimal accountability for quality or team communication. In such systems, one might expect the quality to be low.”
In this study, engineers were most concerned about the unintended or unknown weakly owned artifacts.
2. As much as possible, assign an owner to currently weakly owned files and directories with unclear accountability.
"Assigning an owner might not imply that this is the only person that is allowed to change the artifact, but that this person is aware of changes to the artifact for example via code reviewing practices.”
3. If driving changes to the ownership model is not possible or desired, use ownership information as an indicator of risk.
“We recommend that changes to weakly owned files and directories are carefully reviewed. Also, we recommend using ownership information to drive test efforts.”
Final thoughts
Previously we learned that code quality is a predictor of productivity. This paper supports one path for improving code quality: focusing on ownership.
What I appreciate about this paper is that while it shows a correlation between a factor (ownership) and something many teams want to improve (code quality), it doesn’t blindly recommend increasing code ownership. Instead, it points out that there is a distinction between intentional and unintentional “weak ownership,” and suggests focusing on making the unintentional known. It also recommends using ownership information as an indicator of risk: if weak ownership is intended, consider investing more in reviewing and testing those areas.
Here are some thoughts from other teams on code ownership:
Some teams may define an “owner” as the point person of knowledge/direction/insights
Ownership should be codified and queryable
Shared code ownership requires more discipline and trust, and is not recommended with Team Topologies
Some commonly discussed downsides of increasing code ownership to consider are delays that this may cause, and the risk of losing knowledge if someone leaves
“Code ownership is underrated… Code without clear ownership and many editors is some of the trickiest code to understand.”
That’s it for this week! As always, reach out with feedback about this newsletter on LinkedIn or reply to this email.
And if you know someone who might enjoy this issue, please consider sharing it with them.
-Abi
I built a tool called LowEndInsight - that can provide insight into a Git repo's contributor distribution - and highlight the issue of functional, versus passive/weak contributors. https://github.com/gtri/lowendinsight