How Google is accelerating code migrations with AI
Google cuts code migration time in half by automating tasks with AI.
Welcome to the latest issue of Engineering Enablement, a weekly newsletter sharing research and perspectives on developer productivity.
This week I read Migrating Code At Scale With LLMs At Google, a new paper describing how Google used AI to automate a large, tedious migration initiative: converting 32-bit integers to 64-bit across their monolithic codebase. This type of migration had previously taken two years to complete manually. With their new system, Google cut that time in half while having AI generate 70% of the code changes.
This paper shows the potential of using AI to automate a substantial portion of code migration tasks.
My summary of the paper
While migrations are one of the most necessary parts of software maintenance, they can be time-intensive, costly, error-prone, and unrewarding for developers, making them a great candidate for using AI. Google's system solves this by identifying code that needs changing, using an LLM to generate updates, validating the changes through several checkpoints, and routing successful modifications for human review. Today the system runs nightly, continually chipping away at the migration task until complete. In this paper, the authors describe how the system works, its results, and its benefits and challenges.
How the system works
Google’s automated migration workflow consists of three main components:
1. Finding where to make changes. The system uses Kythe, Google's internal code indexing system, to trace both direct and indirect references to ID fields. The system maps dependencies up to five levels deep, casting a wide net to avoid missing anything, even if that means over-including.
To deal with noise and false positives, they:
Use automated classifiers to flag irrelevant or already-migrated code
Run regression tests to catch missing or incorrect changes
Rely on the LLM to decide what actually needs to be edited
Importantly, instead of giving the LLM just a few lines of code, they feed it the entire file so it can understand the full context and make more accurate changes.
2. Categorizing references. Each potential code location is sorted into one of four buckets:
Not-migrated locations are confirmed as needing changes, identified with 100% confidence through automated checks (like finding test code with small integers).
Irrelevant locations are those that definitely don't need changes, such as class definitions or code already using values outside the 32-bit range.
Relevant locations are those that directly reference the ID and likely need investigation but don't fall into either of the previous categories.
Leftover contains everything not automatically sorted into the other categories. Developers manually review these locations and decide whether they need migration or not. During the next system run, these manually reviewed locations are moved to the appropriate category based on the developer's decision.
3. Making and validating the changes. Google uses an internal version of Gemini to generate diffs. A prompt explains the migration and provides suggested lines, but the LLM is free to modify any part of the file.
Each proposed change is validated through a stepwise process:
Did the LLM return a valid result?
Did it change more than just whitespace?
Can the new code be parsed?
Was the change actually needed?
Does it build and pass tests?
Only changes that pass all checks are submitted for review. Failed changes are marked for manual handling.
Results and impact
The researchers evaluated their system through a comprehensive case study of 39 distinct migrations over twelve months. The results are impressive:
595 code changes were submitted, containing 93,574 edits
74% of code changes were generated by the LLM (either entirely or with human adjustment)
69% of all edits were made by the LLM
Additionally, developers reported high satisfaction with the system and estimated a 50% reduction in time spent on migrations compared to the manual approach.
Benefits and challenges
The researchers found several advantages of their approach:
End-to-end automation: The system handled the entire process from identifying references to submitting validated changes.
LLM flexibility: The model is adapted to different code styles, languages, and patterns with just a natural language prompt.
Validation pipeline: Developers only reviewed high-quality changes that had already passed builds and tests.
However, they also encountered challenges:
LLM limitations: Context window constraints, hallucinations, and variable performance across programming languages sometimes require manual intervention.
Pre-existing issues: Build failures and test dependencies occasionally hindered the automated process.
Production roll-out complexities: Large, distributed migrations still required careful management during production deployment.
Final thoughts
This paper highlights the incredible potential of using AI to assist with large-scale code migration tasks. This may be useful for teams exploring ways to improve developer productivity with AI, as well as mature organizations looking for faster ways to update and maintain their codebases.
By automating much of the work, Google’s system cut down manual effort, saved developers time, and gave them a clearer sense of progress, making the entire migration process less tedious and more manageable.
Who’s hiring right now
This week’s featured DevProd & Platform job openings. See more open roles here.
ScalePad is hiring a Head of AI Engineering & Enablement | Canada (Remote or in-office)
Capital One is hiring a Manager, Product Management - Platform | Plano, TX and McLean, VA
Preply is hiring a Senior DevEx Engineer | Barcelona
Snowflake is hiring a Director of Engineering - Test Framework | Bellevue and Menlo Park
That’s it for this week. If you know someone who would enjoy this issue, share it with them:
-Abi