How Stripe's Minions Ship 1,300 PRs a Week
Deep dive into Stripe's automated code migration system that ships over 1,300 pull requests every week with zero manual effort.
2026-03-22automationcodemodsdeveloper-toolsrubymonorepo
The Scale of Stripe's Codebase
Stripe processes hundreds of billions of dollars annually. Behind that is one of the world's largest Ruby monorepos — millions of lines of code, thousands of engineers, hundreds of services all living in a single repository.
A monorepo at this scale creates a unique problem: cross-cutting changes. When a shared library updates its API, when a security vulnerability needs patching, when a coding standard changes — every team that uses that code must update. Multiply that by hundreds of internal APIs and thousands of consumers, and you get a coordination nightmare.
graph TD
subgraph "Shared Library Update"
LIB["PaymentV1 → PaymentV2"]
end
subgraph "Consumers (200+ services)"
S1["Checkout Service<br/>47 files affected"]
S2["Billing Service<br/>23 files affected"]
S3["Fraud Detection<br/>15 files affected"]
S4["Reporting Service<br/>31 files affected"]
S5["Mobile API<br/>19 files affected"]
S6["... 50+ more services"]
end
LIB --> S1
LIB --> S2
LIB --> S3
LIB --> S4
LIB --> S5
LIB --> S6
style LIB fill:#ef4444,stroke:#dc2626,color:#fff
style S6 fill:#94a3b8,stroke:#64748b,color:#fffWhy Manual Migrations Fail
Before Minions, Stripe tried the standard approaches every large company tries:
| Approach | What Happens | Why It Fails |
|---|---|---|
| Email announcement | "Please migrate to PaymentV2 by Q3" | Engineers are busy shipping features. Migration sits in backlog forever. |
| Jira tickets per team | Create 50+ tickets across all consuming teams | Inconsistent execution. Some teams migrate in a day, others never do. The old API can't be deprecated. |
| Dedicated migration team | One team manually updates all consumers | Doesn't scale. The migration team doesn't understand every service's context. Reviews take forever. |
| Big-bang rewrite | Rewrite everything at once in a massive PR | Too risky for a payment system. One bug could affect billions of dollars. |
The Real Cost
Stripe calculated that manual migrations were costing 12,000+ engineering hours per quarter — equivalent to 20 full-time engineers doing nothing but updating other people's code. That's not sustainable.
What Minions Is
Minions is Stripe's internal automated code migration platform. It takes a codemod (a program that transforms code) and autonomously:
- Runs it against the entire monorepo
- Groups changes into team-scoped pull requests
- Runs the full CI/CD pipeline on each PR
- Auto-merges safe changes or assigns risky ones for human review
- Monitors post-merge for regressions
- Automatically rolls back if something goes wrong
The name "Minions" captures the philosophy: small, autonomous workers that relentlessly execute tedious-but-critical tasks so engineers can focus on building products.
Architecture Deep Dive
The Minions system is composed of five major subsystems. Let's trace the full lifecycle of a migration from authoring to merge.
flowchart TB
subgraph Author["1. Codemod Authoring"]
ENG["Engineer writes codemod"]
REPO["Codemod registered<br/>in Minions catalog"]
TEST["Unit tests + dry-run<br/>validation"]
end
subgraph Engine["2. Execution Engine"]
SCAN["Full monorepo scan<br/>(AST parsing)"]
MATCH["Match affected files<br/>(pattern matching)"]
TRANSFORM["Apply transformation<br/>(AST rewriting)"]
BATCH["Group changes<br/>by team/service"]
end
subgraph Pipeline["3. PR Pipeline"]
PR["Create PRs with<br/>context + metadata"]
CI["Run full CI suite<br/>(unit, integration, e2e)"]
CLASSIFY["Classify risk level<br/>(mechanical vs semantic)"]
end
subgraph Merge["4. Merge Strategy"]
AUTO["Auto-merge<br/>(mechanical, CI green)"]
REVIEW["Human review<br/>(semantic changes)"]
end
subgraph Monitor["5. Post-Merge Monitoring"]
CANARY["Canary deployment<br/>monitoring"]
ALERT["Alert detection<br/>(error rate, latency)"]
ROLLBACK["Automatic rollback<br/>(per-PR granularity)"]
end
ENG --> REPO --> TEST --> SCAN
SCAN --> MATCH --> TRANSFORM --> BATCH
BATCH --> PR --> CI --> CLASSIFY
CLASSIFY -->|Low risk| AUTO
CLASSIFY -->|High risk| REVIEW
AUTO --> CANARY
REVIEW --> CANARY
CANARY --> ALERT
ALERT -->|Regression detected| ROLLBACK
style Author fill:#dbeafe,stroke:#3b82f6
style Engine fill:#dcfce7,stroke:#22c55e
style Pipeline fill:#fef3c7,stroke:#f59e0b
style Merge fill:#f3e8ff,stroke:#a855f7
style Monitor fill:#fee2e2,stroke:#ef4444Subsystem 1: Codemod Authoring
A codemod is a program that reads source code, understands its structure via an Abstract Syntax Tree (AST), and produces a transformed version. Stripe's codemod framework is built on top of Ruby's parser gem with custom extensions.
What is an AST?
An AST is a tree representation of source code where each node represents a syntactic construct. Unlike regex-based find-and-replace, AST transformations understand code structure.
graph TD
SEND["Send (method call)"]
RECV["Receiver:<br/>Const :PaymentV1"]
METHOD["Method: :charge"]
ARG1["Arg 1:<br/>Int 100"]
ARG2["Arg 2:<br/>Str 'usd'"]
ARG3["Arg 3:<br/>LVar :card_token"]
SEND --> RECV
SEND --> METHOD
SEND --> ARG1
SEND --> ARG2
SEND --> ARG3
style SEND fill:#3b82f6,stroke:#2563eb,color:#fff
style RECV fill:#ef4444,stroke:#dc2626,color:#fff
style METHOD fill:#ef4444,stroke:#dc2626,color:#fffThe codemod walks this tree, finds nodes matching a pattern (e.g., any Send node where receiver is PaymentV1 and method is charge), and rewrites them.
Real Codemod Example
Here's a realistic Stripe codemod that migrates from PaymentV1.charge to PaymentV2.process:
class MigratePaymentV1ToV2 < Minions::Codemod
title "Migrate PaymentV1.charge → PaymentV2.process"
owner "payments-platform"
risk_level :mechanical # pure rename + arg restructure, no behavior change
rollout :progressive # internal → low-traffic → all
# Pattern: match any call to PaymentV1.charge(amount, currency, source)
pattern do
send_node(receiver: const(:PaymentV1), method: :charge)
end
def transform(node)
amount, currency, source = node.arguments
# Rewrite to: PaymentV2.process(amount:, currency:, source:)
node.replace_with(
send_node(
receiver: const(:PaymentV2),
method: :process,
arguments: [
keyword_arg(:amount, amount),
keyword_arg(:currency, currency),
keyword_arg(:source, source)
]
)
)
end
# Validation: run both old and new code in shadow mode
shadow_test do |original_result, transformed_result|
original_result == transformed_result
end
endWhy AST, Not Regex?
Consider the string "PaymentV1.charge is deprecated" inside a log statement, or # PaymentV1.charge — old API in a comment. A regex-based approach would incorrectly modify these. AST-based transformation only matches actual method calls in executable code.
Codemod Testing Pipeline
Before a codemod runs against the monorepo, it goes through rigorous validation:
flowchart LR
WRITE["Author writes<br/>codemod"]
UNIT["Unit tests<br/>(known inputs → expected outputs)"]
DRY["Dry-run on<br/>full monorepo"]
REVIEW["Code review by<br/>Minions team"]
APPROVE["Approved for<br/>production run"]
WRITE --> UNIT -->|Pass| DRY -->|No unexpected<br/>changes| REVIEW -->|Approved| APPROVE
UNIT -->|Fail| WRITE
DRY -->|Unexpected<br/>changes| WRITE
style APPROVE fill:#22c55e,stroke:#16a34a,color:#fffThe dry-run step is critical. It executes the codemod against the entire monorepo but doesn't commit any changes. Instead, it produces a report:
- Total files matched: How many files will be modified
- Total changes: How many individual transformations
- Team distribution: Which teams are affected and how much
- Confidence score: Based on how well the pattern matches (exact match vs heuristic)
- Edge case report: Any transformations the codemod wasn't 100% sure about
Subsystem 2: Execution Engine
The execution engine is where the codemod runs at scale. It's designed to handle Stripe's entire monorepo (millions of lines) in under 30 minutes.
flowchart TB
subgraph Coordinator
SCHED["Scheduler"]
SPLIT["File Splitter<br/>(partition by directory)"]
MERGE["Result Merger"]
end
subgraph Workers["Worker Pool (N parallel workers)"]
W1["Worker 1<br/>parse + transform<br/>files A-F"]
W2["Worker 2<br/>parse + transform<br/>files G-M"]
W3["Worker 3<br/>parse + transform<br/>files N-S"]
W4["Worker N<br/>parse + transform<br/>files T-Z"]
end
SCHED --> SPLIT
SPLIT --> W1
SPLIT --> W2
SPLIT --> W3
SPLIT --> W4
W1 --> MERGE
W2 --> MERGE
W3 --> MERGE
W4 --> MERGE
subgraph Output
DIFF["Unified diff<br/>per file"]
GROUP["Group by<br/>CODEOWNERS"]
BATCH["Batch into<br/>PR-sized chunks"]
end
MERGE --> DIFF --> GROUP --> BATCHKey design decisions:
- Parallel AST parsing: Files are distributed across workers. Each worker independently parses and transforms its assigned files. No shared state.
- CODEOWNERS-based batching: Changes are grouped by the
CODEOWNERSfile — each team gets one PR containing all changes to their code. This respects ownership boundaries. - Chunk size limits: If a team has 500+ files to change, the PR is split into smaller chunks (typically 50–100 files) to keep reviews manageable.
Subsystem 3: Automated PR Pipeline
For each batch, Minions creates a pull request that's designed to be self-explanatory:
graph TD
subgraph PR["Pull Request"]
TITLE["Title: [Minions] Migrate PaymentV1 → PaymentV2<br/>(batch 3/7 — Billing Service)"]
DESC["Description:<br/>• What: Automated migration from PaymentV1.charge to PaymentV2.process<br/>• Why: PaymentV1 is deprecated, removal planned for Q3<br/>• Codemod: link to source + tests<br/>• Migration plan: link to RFC<br/>• Risk: Mechanical (pure rename + arg restructure)<br/>• Rollback: Auto-revert enabled"]
CHANGES["Changes:<br/>23 files modified<br/>47 method calls transformed"]
CI_STATUS["CI Status:<br/>✅ Unit tests (2,341 passed)<br/>✅ Integration tests (156 passed)<br/>✅ Shadow mode validation<br/>✅ Type checker<br/>✅ Linter"]
REVIEWERS["Reviewers:<br/>@billing-team (auto-assigned via CODEOWNERS)"]
end
TITLE --> DESC --> CHANGES --> CI_STATUS --> REVIEWERS
style PR fill:#f8fafc,stroke:#e2e8f0
style CI_STATUS fill:#dcfce7,stroke:#22c55eThe PR description template includes enough context that a reviewer can understand the change without leaving the PR page. This is intentional — reducing context-switching increases the probability of timely reviews.
Subsystem 4: Risk Classification and Merge Strategy
Not all code changes carry the same risk. Minions classifies each PR into one of three tiers:
flowchart TD
START["Analyze PR changes"]
Q1{"Does the change<br/>modify behavior?"}
Q2{"Does it affect<br/>payment flows?"}
Q3{"All CI checks<br/>passing?"}
Q4{"Shadow test<br/>results match?"}
MECHANICAL["TIER 1: Mechanical<br/>Auto-merge enabled"]
GUARDED["TIER 2: Guarded<br/>Auto-merge + monitoring"]
MANUAL["TIER 3: Manual<br/>Requires human review"]
START --> Q1
Q1 -->|"No (pure rename,<br/>import change)"| Q3
Q1 -->|"Yes (logic change,<br/>new behavior)"| Q2
Q2 -->|Yes| MANUAL
Q2 -->|No| Q4
Q3 -->|Yes| MECHANICAL
Q3 -->|No| MANUAL
Q4 -->|Match| GUARDED
Q4 -->|Mismatch| MANUAL
style MECHANICAL fill:#22c55e,stroke:#16a34a,color:#fff
style GUARDED fill:#f59e0b,stroke:#d97706,color:#fff
style MANUAL fill:#ef4444,stroke:#dc2626,color:#fff| Tier | Criteria | Merge Strategy | % of PRs |
|---|---|---|---|
| Mechanical | Pure renames, import changes, formatting. No behavior change. All CI green. | Auto-merged within 1 hour | ~70% |
| Guarded | Behavior-preserving but non-trivial (e.g., API arg restructure). Shadow tests pass. | Auto-merged with enhanced post-merge monitoring | ~20% |
| Manual | Behavior change, payment-critical path, or CI failures. | Assigned to team for human review | ~10% |
The 90% auto-merge rate comes from the fact that most large-scale migrations are mechanical — they change syntax, not semantics.
Subsystem 5: Post-Merge Monitoring and Rollback
This is the safety net that makes the entire system possible. Every Minions PR carries rollback metadata — the exact commit to revert and the list of affected services.
sequenceDiagram
participant M as Minions
participant GH as GitHub
participant CI as CI/CD Pipeline
participant CANARY as Canary Deploy
participant MON as Monitoring
participant ALERT as Alert System
M->>GH: Merge PR
GH->>CI: Trigger deployment
CI->>CANARY: Deploy to canary (5% traffic)
loop Every 30 seconds for 15 minutes
MON->>CANARY: Check error rate, latency, success rate
alt Metrics healthy
MON->>MON: Continue monitoring
else Regression detected
MON->>ALERT: Fire alert
ALERT->>M: Trigger rollback
M->>GH: Create revert PR
GH->>CI: Deploy revert
Note over M,CI: Automatic rollback<br/>within 5 minutes
end
end
MON->>CI: Promote to 100% trafficWhat triggers a rollback?
- Error rate increase: >0.1% increase in 5xx errors for any affected service
- Latency spike: P99 latency increases by >20% for any affected endpoint
- Business metric anomaly: Payment success rate drops below threshold
- Type/runtime errors: Any new exception types appearing in affected code paths
Rollback Granularity
Unlike a blanket revert of the entire deployment, Minions can roll back individual PRs. If a batch of 20 PRs was merged and only one caused a regression, only that specific PR is reverted. This is possible because each PR's changes are tracked at the file and line level.
Blast Radius Control: Progressive Rollout
Minions doesn't ship all PRs simultaneously. Changes are rolled out in waves over 48 hours:
gantt
title Migration Rollout Timeline
dateFormat HH:mm
axisFormat %H:%M
section Wave 1 (Hour 0-4)
Internal test services :w1, 00:00, 4h
section Wave 2 (Hour 4-12)
Low-traffic services :w2, after w1, 8h
section Wave 3 (Hour 12-24)
Medium-traffic services :w3, after w2, 12h
section Wave 4 (Hour 24-48)
High-traffic + payment-critical :w4, after w3, 24h| Wave | Services | Traffic | Monitoring Period | Auto-Rollback |
|---|---|---|---|---|
| Wave 1 | Internal tools, test environments | ~0% of production traffic | 4 hours | Yes |
| Wave 2 | Low-traffic APIs, admin dashboards | ~5% of production traffic | 8 hours | Yes |
| Wave 3 | Medium-traffic services | ~30% of production traffic | 12 hours | Yes |
| Wave 4 | Payment processing, high-traffic APIs | 100% of production traffic | 24 hours | Yes |
If any wave shows regressions, the entire rollout pauses. The codemod author is notified and must fix the issue before proceeding.
Shadow Mode: Semantic Equivalence Testing
For changes that modify behavior (Tier 2 and 3), Minions runs both the old and new code simultaneously in production:
flowchart LR
REQ["Incoming<br/>request"]
subgraph Production["Production Path (serves response)"]
OLD["Old code path<br/>PaymentV1.charge"]
RESP["Response to user"]
end
subgraph Shadow["Shadow Path (compare only)"]
NEW["New code path<br/>PaymentV2.process"]
COMPARE["Compare results"]
end
REQ --> OLD --> RESP
REQ --> NEW --> COMPARE
COMPARE -->|Match| LOG_OK["✅ Log: equivalent"]
COMPARE -->|Mismatch| LOG_DIFF["❌ Log: divergence<br/>+ diff details"]
style Production fill:#dcfce7,stroke:#22c55e
style Shadow fill:#fef3c7,stroke:#f59e0b
style LOG_DIFF fill:#fee2e2,stroke:#ef4444Shadow mode runs for a configurable period (typically 24–72 hours). The results are aggregated:
- 100% match rate: Safe to auto-merge
- >99.9% match rate: Investigate the mismatches, usually edge cases. May still auto-merge.
- <99.9% match rate: Flagged for manual review. The codemod likely has a bug.
How Stripe Handles the Remaining 10% — Human-Reviewed PRs
Even with automation, roughly 10% of PRs require human review. Minions optimizes this process too:
- Smart reviewer assignment: Uses
CODEOWNERSplus historical review data. If Alice reviewed the last 5 Minions PRs for this service and approved quickly, she gets assigned again. - Pre-answered review questions: The PR description anticipates common reviewer questions ("Is this backwards compatible?" "What happens to in-flight requests?") and answers them upfront.
- Review deadline escalation: If a PR isn't reviewed within 48 hours, it escalates to the team lead. At 72 hours, it escalates to the engineering manager. Unreviewed PRs block the migration timeline.
- One-click approval: For Tier 2 changes where shadow tests pass, the reviewer sees a "Shadow test results: 100% match over 48 hours" badge. Most approve in under 2 minutes.
The Numbers
| Metric | Value |
|---|---|
| PRs generated per week | 1,300+ |
| Auto-merge rate | ~90% |
| Median time: codemod authored → all PRs merged | 4 hours (mechanical), 48 hours (guarded) |
| Active codemods running simultaneously | 100+ |
| Files modified per week | 10,000+ |
| Engineering hours saved per quarter | 12,000+ |
| Rollback rate | <0.5% |
| Post-merge incidents caused by Minions | ~2 per quarter (out of 15,000+ PRs) |
For Perspective
12,000 engineering hours per quarter is roughly $3–5 million in engineering salary at Bay Area rates. Minions likely pays for its dedicated team (estimated 8–12 engineers) many times over.
What Makes Minions Different from Open-Source Codemods?
You might wonder: can't you just use jscodeshift or Rubocop --auto-correct? Yes, but those tools solve 20% of the problem. The remaining 80% is the orchestration:
graph LR
subgraph Tool["Open-Source Codemod Tool"]
T1["✅ Parse AST"]
T2["✅ Apply transformation"]
T3["✅ Write modified files"]
end
subgraph Platform["Stripe Minions Platform"]
P1["✅ Parse AST"]
P2["✅ Apply transformation"]
P3["✅ Write modified files"]
P4["✅ Batch by team ownership"]
P5["✅ Create PRs with full context"]
P6["✅ Run CI per PR"]
P7["✅ Classify risk"]
P8["✅ Auto-merge safe changes"]
P9["✅ Progressive rollout"]
P10["✅ Shadow testing"]
P11["✅ Post-merge monitoring"]
P12["✅ Automatic rollback"]
P13["✅ Reviewer assignment"]
P14["✅ Escalation workflows"]
P15["✅ Migration progress dashboard"]
end
style Tool fill:#fef3c7,stroke:#f59e0b
style Platform fill:#dcfce7,stroke:#22c55eKey Takeaways for Engineers
1. Invest in Developer Tooling as a Product
Stripe treats developer productivity as a first-class product with dedicated teams, roadmaps, and KPIs. Minions isn't a hack someone built in a hackathon — it's a production platform maintained by 8–12 engineers. The ROI is clear: every hour spent building Minions saves hundreds of engineering hours across the company.
How to apply this: Even at smaller companies, investing in a simple codemod framework (using tools like jscodeshift, JavaParser, or Rector) can save enormous time. Start with the most common migration pattern in your codebase.
2. AST-Based Transformations Beat Regex — Always
If you're doing code migrations with sed, grep, or regex, you'll hit edge cases immediately: strings, comments, similarly-named variables, multi-line expressions. AST-based tools understand code structure and produce correct transformations even for complex syntax.
How to apply this: Learn your language's AST tools:
- Java: JavaParser, OpenRewrite, Error Prone
- JavaScript/TypeScript: jscodeshift, ts-morph
- Python: LibCST, Bowler
- Ruby: Rubocop, parser gem
- Go: go/ast + analysis packages
3. Safety Mechanisms Make Automation Possible
The only reason Stripe can auto-merge 1,300 PRs per week to a payment processing system is the multi-layered safety net: shadow testing, progressive rollout, automatic rollback, blast radius control. Without these, automated changes at this scale would be reckless.
How to apply this: Before automating any code change at scale, build the safety infrastructure first. At minimum:
- Feature flags to gradually roll out
- Monitoring that catches regressions within minutes
- One-click (or automatic) rollback capability
4. Batch by Ownership, Not by Change
Minions groups changes by CODEOWNERS, not by the type of change. Each team gets one PR with all changes to their code. This respects ownership boundaries and makes reviews coherent — a reviewer sees all changes in their domain at once, not scattered fragments.
5. Automate the Boring Parts to Unlock the Interesting Ones
The most impactful automation isn't the flashy AI-powered system — it's automating the tedious, repetitive maintenance work that nobody wants to do. Code migrations, dependency upgrades, style enforcement, import sorting — these are perfect candidates. When engineers don't spend time on maintenance, they ship features faster.
Start Small, Think Big
You don't need Stripe's scale to benefit from this approach. Start with one codemod that automates your most painful recurring migration. Once the team sees the value, investment in tooling becomes an easy sell.