systemdesigndoc
Case Study

How Stripe's Minions Ship 1,300 PRs a Week

Deep dive into Stripe's automated code migration system that ships over 1,300 pull requests every week with zero manual effort.

2026-03-22automationcodemodsdeveloper-toolsrubymonorepo

Stripe

The Scale of Stripe's Codebase

Stripe processes hundreds of billions of dollars annually. Behind that is one of the world's largest Ruby monorepos — millions of lines of code, thousands of engineers, hundreds of services all living in a single repository.

A monorepo at this scale creates a unique problem: cross-cutting changes. When a shared library updates its API, when a security vulnerability needs patching, when a coding standard changes — every team that uses that code must update. Multiply that by hundreds of internal APIs and thousands of consumers, and you get a coordination nightmare.

mermaid
graph TD
    subgraph "Shared Library Update"
        LIB["PaymentV1 → PaymentV2"]
    end
 
    subgraph "Consumers (200+ services)"
        S1["Checkout Service<br/>47 files affected"]
        S2["Billing Service<br/>23 files affected"]
        S3["Fraud Detection<br/>15 files affected"]
        S4["Reporting Service<br/>31 files affected"]
        S5["Mobile API<br/>19 files affected"]
        S6["... 50+ more services"]
    end
 
    LIB --> S1
    LIB --> S2
    LIB --> S3
    LIB --> S4
    LIB --> S5
    LIB --> S6
 
    style LIB fill:#ef4444,stroke:#dc2626,color:#fff
    style S6 fill:#94a3b8,stroke:#64748b,color:#fff
The cross-cutting change problem at monorepo scale

Why Manual Migrations Fail

Before Minions, Stripe tried the standard approaches every large company tries:

ApproachWhat HappensWhy It Fails
Email announcement"Please migrate to PaymentV2 by Q3"Engineers are busy shipping features. Migration sits in backlog forever.
Jira tickets per teamCreate 50+ tickets across all consuming teamsInconsistent execution. Some teams migrate in a day, others never do. The old API can't be deprecated.
Dedicated migration teamOne team manually updates all consumersDoesn't scale. The migration team doesn't understand every service's context. Reviews take forever.
Big-bang rewriteRewrite everything at once in a massive PRToo risky for a payment system. One bug could affect billions of dollars.

The Real Cost

Stripe calculated that manual migrations were costing 12,000+ engineering hours per quarter — equivalent to 20 full-time engineers doing nothing but updating other people's code. That's not sustainable.

What Minions Is

Minions is Stripe's internal automated code migration platform. It takes a codemod (a program that transforms code) and autonomously:

  1. Runs it against the entire monorepo
  2. Groups changes into team-scoped pull requests
  3. Runs the full CI/CD pipeline on each PR
  4. Auto-merges safe changes or assigns risky ones for human review
  5. Monitors post-merge for regressions
  6. Automatically rolls back if something goes wrong

The name "Minions" captures the philosophy: small, autonomous workers that relentlessly execute tedious-but-critical tasks so engineers can focus on building products.

Architecture Deep Dive

The Minions system is composed of five major subsystems. Let's trace the full lifecycle of a migration from authoring to merge.

mermaid
flowchart TB
    subgraph Author["1. Codemod Authoring"]
        ENG["Engineer writes codemod"]
        REPO["Codemod registered<br/>in Minions catalog"]
        TEST["Unit tests + dry-run<br/>validation"]
    end
 
    subgraph Engine["2. Execution Engine"]
        SCAN["Full monorepo scan<br/>(AST parsing)"]
        MATCH["Match affected files<br/>(pattern matching)"]
        TRANSFORM["Apply transformation<br/>(AST rewriting)"]
        BATCH["Group changes<br/>by team/service"]
    end
 
    subgraph Pipeline["3. PR Pipeline"]
        PR["Create PRs with<br/>context + metadata"]
        CI["Run full CI suite<br/>(unit, integration, e2e)"]
        CLASSIFY["Classify risk level<br/>(mechanical vs semantic)"]
    end
 
    subgraph Merge["4. Merge Strategy"]
        AUTO["Auto-merge<br/>(mechanical, CI green)"]
        REVIEW["Human review<br/>(semantic changes)"]
    end
 
    subgraph Monitor["5. Post-Merge Monitoring"]
        CANARY["Canary deployment<br/>monitoring"]
        ALERT["Alert detection<br/>(error rate, latency)"]
        ROLLBACK["Automatic rollback<br/>(per-PR granularity)"]
    end
 
    ENG --> REPO --> TEST --> SCAN
    SCAN --> MATCH --> TRANSFORM --> BATCH
    BATCH --> PR --> CI --> CLASSIFY
    CLASSIFY -->|Low risk| AUTO
    CLASSIFY -->|High risk| REVIEW
    AUTO --> CANARY
    REVIEW --> CANARY
    CANARY --> ALERT
    ALERT -->|Regression detected| ROLLBACK
 
    style Author fill:#dbeafe,stroke:#3b82f6
    style Engine fill:#dcfce7,stroke:#22c55e
    style Pipeline fill:#fef3c7,stroke:#f59e0b
    style Merge fill:#f3e8ff,stroke:#a855f7
    style Monitor fill:#fee2e2,stroke:#ef4444
Minions end-to-end architecture

Subsystem 1: Codemod Authoring

A codemod is a program that reads source code, understands its structure via an Abstract Syntax Tree (AST), and produces a transformed version. Stripe's codemod framework is built on top of Ruby's parser gem with custom extensions.

What is an AST?

An AST is a tree representation of source code where each node represents a syntactic construct. Unlike regex-based find-and-replace, AST transformations understand code structure.

Abstract Syntax Tree — each node represents a syntactic construct in the source code

mermaid
graph TD
    SEND["Send (method call)"]
    RECV["Receiver:<br/>Const :PaymentV1"]
    METHOD["Method: :charge"]
    ARG1["Arg 1:<br/>Int 100"]
    ARG2["Arg 2:<br/>Str 'usd'"]
    ARG3["Arg 3:<br/>LVar :card_token"]
 
    SEND --> RECV
    SEND --> METHOD
    SEND --> ARG1
    SEND --> ARG2
    SEND --> ARG3
 
    style SEND fill:#3b82f6,stroke:#2563eb,color:#fff
    style RECV fill:#ef4444,stroke:#dc2626,color:#fff
    style METHOD fill:#ef4444,stroke:#dc2626,color:#fff
AST representation of PaymentV1.charge(100, 'usd', card_token)

The codemod walks this tree, finds nodes matching a pattern (e.g., any Send node where receiver is PaymentV1 and method is charge), and rewrites them.

Tree data structure — ASTs use the same recursive node structure to represent parsed code

Real Codemod Example

Here's a realistic Stripe codemod that migrates from PaymentV1.charge to PaymentV2.process:

ruby
class MigratePaymentV1ToV2 < Minions::Codemod
  title       "Migrate PaymentV1.charge → PaymentV2.process"
  owner       "payments-platform"
  risk_level  :mechanical  # pure rename + arg restructure, no behavior change
  rollout     :progressive # internal → low-traffic → all
 
  # Pattern: match any call to PaymentV1.charge(amount, currency, source)
  pattern do
    send_node(receiver: const(:PaymentV1), method: :charge)
  end
 
  def transform(node)
    amount, currency, source = node.arguments
 
    # Rewrite to: PaymentV2.process(amount:, currency:, source:)
    node.replace_with(
      send_node(
        receiver: const(:PaymentV2),
        method:   :process,
        arguments: [
          keyword_arg(:amount,   amount),
          keyword_arg(:currency, currency),
          keyword_arg(:source,   source)
        ]
      )
    )
  end
 
  # Validation: run both old and new code in shadow mode
  shadow_test do |original_result, transformed_result|
    original_result == transformed_result
  end
end

Why AST, Not Regex?

Consider the string "PaymentV1.charge is deprecated" inside a log statement, or # PaymentV1.charge — old API in a comment. A regex-based approach would incorrectly modify these. AST-based transformation only matches actual method calls in executable code.

Codemod Testing Pipeline

Before a codemod runs against the monorepo, it goes through rigorous validation:

mermaid
flowchart LR
    WRITE["Author writes<br/>codemod"]
    UNIT["Unit tests<br/>(known inputs → expected outputs)"]
    DRY["Dry-run on<br/>full monorepo"]
    REVIEW["Code review by<br/>Minions team"]
    APPROVE["Approved for<br/>production run"]
 
    WRITE --> UNIT -->|Pass| DRY -->|No unexpected<br/>changes| REVIEW -->|Approved| APPROVE
    UNIT -->|Fail| WRITE
    DRY -->|Unexpected<br/>changes| WRITE
 
    style APPROVE fill:#22c55e,stroke:#16a34a,color:#fff
Codemod validation before production execution

The dry-run step is critical. It executes the codemod against the entire monorepo but doesn't commit any changes. Instead, it produces a report:

  • Total files matched: How many files will be modified
  • Total changes: How many individual transformations
  • Team distribution: Which teams are affected and how much
  • Confidence score: Based on how well the pattern matches (exact match vs heuristic)
  • Edge case report: Any transformations the codemod wasn't 100% sure about

Subsystem 2: Execution Engine

The execution engine is where the codemod runs at scale. It's designed to handle Stripe's entire monorepo (millions of lines) in under 30 minutes.

mermaid
flowchart TB
    subgraph Coordinator
        SCHED["Scheduler"]
        SPLIT["File Splitter<br/>(partition by directory)"]
        MERGE["Result Merger"]
    end
 
    subgraph Workers["Worker Pool (N parallel workers)"]
        W1["Worker 1<br/>parse + transform<br/>files A-F"]
        W2["Worker 2<br/>parse + transform<br/>files G-M"]
        W3["Worker 3<br/>parse + transform<br/>files N-S"]
        W4["Worker N<br/>parse + transform<br/>files T-Z"]
    end
 
    SCHED --> SPLIT
    SPLIT --> W1
    SPLIT --> W2
    SPLIT --> W3
    SPLIT --> W4
    W1 --> MERGE
    W2 --> MERGE
    W3 --> MERGE
    W4 --> MERGE
 
    subgraph Output
        DIFF["Unified diff<br/>per file"]
        GROUP["Group by<br/>CODEOWNERS"]
        BATCH["Batch into<br/>PR-sized chunks"]
    end
 
    MERGE --> DIFF --> GROUP --> BATCH
Execution engine parallel processing architecture

Key design decisions:

  • Parallel AST parsing: Files are distributed across workers. Each worker independently parses and transforms its assigned files. No shared state.
  • CODEOWNERS-based batching: Changes are grouped by the CODEOWNERS file — each team gets one PR containing all changes to their code. This respects ownership boundaries.
  • Chunk size limits: If a team has 500+ files to change, the PR is split into smaller chunks (typically 50–100 files) to keep reviews manageable.

Subsystem 3: Automated PR Pipeline

For each batch, Minions creates a pull request that's designed to be self-explanatory:

mermaid
graph TD
    subgraph PR["Pull Request"]
        TITLE["Title: [Minions] Migrate PaymentV1 → PaymentV2<br/>(batch 3/7 — Billing Service)"]
        DESC["Description:<br/>• What: Automated migration from PaymentV1.charge to PaymentV2.process<br/>• Why: PaymentV1 is deprecated, removal planned for Q3<br/>• Codemod: link to source + tests<br/>• Migration plan: link to RFC<br/>• Risk: Mechanical (pure rename + arg restructure)<br/>• Rollback: Auto-revert enabled"]
        CHANGES["Changes:<br/>23 files modified<br/>47 method calls transformed"]
        CI_STATUS["CI Status:<br/>✅ Unit tests (2,341 passed)<br/>✅ Integration tests (156 passed)<br/>✅ Shadow mode validation<br/>✅ Type checker<br/>✅ Linter"]
        REVIEWERS["Reviewers:<br/>@billing-team (auto-assigned via CODEOWNERS)"]
    end
 
    TITLE --> DESC --> CHANGES --> CI_STATUS --> REVIEWERS
 
    style PR fill:#f8fafc,stroke:#e2e8f0
    style CI_STATUS fill:#dcfce7,stroke:#22c55e
Anatomy of a Minions-generated PR

The PR description template includes enough context that a reviewer can understand the change without leaving the PR page. This is intentional — reducing context-switching increases the probability of timely reviews.

Subsystem 4: Risk Classification and Merge Strategy

Not all code changes carry the same risk. Minions classifies each PR into one of three tiers:

mermaid
flowchart TD
    START["Analyze PR changes"]
    Q1{"Does the change<br/>modify behavior?"}
    Q2{"Does it affect<br/>payment flows?"}
    Q3{"All CI checks<br/>passing?"}
    Q4{"Shadow test<br/>results match?"}
 
    MECHANICAL["TIER 1: Mechanical<br/>Auto-merge enabled"]
    GUARDED["TIER 2: Guarded<br/>Auto-merge + monitoring"]
    MANUAL["TIER 3: Manual<br/>Requires human review"]
 
    START --> Q1
    Q1 -->|"No (pure rename,<br/>import change)"| Q3
    Q1 -->|"Yes (logic change,<br/>new behavior)"| Q2
    Q2 -->|Yes| MANUAL
    Q2 -->|No| Q4
    Q3 -->|Yes| MECHANICAL
    Q3 -->|No| MANUAL
    Q4 -->|Match| GUARDED
    Q4 -->|Mismatch| MANUAL
 
    style MECHANICAL fill:#22c55e,stroke:#16a34a,color:#fff
    style GUARDED fill:#f59e0b,stroke:#d97706,color:#fff
    style MANUAL fill:#ef4444,stroke:#dc2626,color:#fff
Risk classification decision tree
TierCriteriaMerge Strategy% of PRs
MechanicalPure renames, import changes, formatting. No behavior change. All CI green.Auto-merged within 1 hour~70%
GuardedBehavior-preserving but non-trivial (e.g., API arg restructure). Shadow tests pass.Auto-merged with enhanced post-merge monitoring~20%
ManualBehavior change, payment-critical path, or CI failures.Assigned to team for human review~10%

The 90% auto-merge rate comes from the fact that most large-scale migrations are mechanical — they change syntax, not semantics.

Continuous Delivery pipeline — Minions automates every stage from code to production

Subsystem 5: Post-Merge Monitoring and Rollback

This is the safety net that makes the entire system possible. Every Minions PR carries rollback metadata — the exact commit to revert and the list of affected services.

mermaid
sequenceDiagram
    participant M as Minions
    participant GH as GitHub
    participant CI as CI/CD Pipeline
    participant CANARY as Canary Deploy
    participant MON as Monitoring
    participant ALERT as Alert System
 
    M->>GH: Merge PR
    GH->>CI: Trigger deployment
    CI->>CANARY: Deploy to canary (5% traffic)
    
    loop Every 30 seconds for 15 minutes
        MON->>CANARY: Check error rate, latency, success rate
        alt Metrics healthy
            MON->>MON: Continue monitoring
        else Regression detected
            MON->>ALERT: Fire alert
            ALERT->>M: Trigger rollback
            M->>GH: Create revert PR
            GH->>CI: Deploy revert
            Note over M,CI: Automatic rollback<br/>within 5 minutes
        end
    end
 
    MON->>CI: Promote to 100% traffic
Post-merge monitoring and automatic rollback flow

What triggers a rollback?

  • Error rate increase: >0.1% increase in 5xx errors for any affected service
  • Latency spike: P99 latency increases by >20% for any affected endpoint
  • Business metric anomaly: Payment success rate drops below threshold
  • Type/runtime errors: Any new exception types appearing in affected code paths

Rollback Granularity

Unlike a blanket revert of the entire deployment, Minions can roll back individual PRs. If a batch of 20 PRs was merged and only one caused a regression, only that specific PR is reverted. This is possible because each PR's changes are tracked at the file and line level.

Blast Radius Control: Progressive Rollout

Minions doesn't ship all PRs simultaneously. Changes are rolled out in waves over 48 hours:

mermaid
gantt
    title Migration Rollout Timeline
    dateFormat HH:mm
    axisFormat %H:%M
 
    section Wave 1 (Hour 0-4)
    Internal test services     :w1, 00:00, 4h
 
    section Wave 2 (Hour 4-12)
    Low-traffic services       :w2, after w1, 8h
 
    section Wave 3 (Hour 12-24)
    Medium-traffic services    :w3, after w2, 12h
 
    section Wave 4 (Hour 24-48)
    High-traffic + payment-critical :w4, after w3, 24h
Progressive rollout across service tiers
WaveServicesTrafficMonitoring PeriodAuto-Rollback
Wave 1Internal tools, test environments~0% of production traffic4 hoursYes
Wave 2Low-traffic APIs, admin dashboards~5% of production traffic8 hoursYes
Wave 3Medium-traffic services~30% of production traffic12 hoursYes
Wave 4Payment processing, high-traffic APIs100% of production traffic24 hoursYes

If any wave shows regressions, the entire rollout pauses. The codemod author is notified and must fix the issue before proceeding.

Shadow Mode: Semantic Equivalence Testing

For changes that modify behavior (Tier 2 and 3), Minions runs both the old and new code simultaneously in production:

mermaid
flowchart LR
    REQ["Incoming<br/>request"]
 
    subgraph Production["Production Path (serves response)"]
        OLD["Old code path<br/>PaymentV1.charge"]
        RESP["Response to user"]
    end
 
    subgraph Shadow["Shadow Path (compare only)"]
        NEW["New code path<br/>PaymentV2.process"]
        COMPARE["Compare results"]
    end
 
    REQ --> OLD --> RESP
    REQ --> NEW --> COMPARE
 
    COMPARE -->|Match| LOG_OK["✅ Log: equivalent"]
    COMPARE -->|Mismatch| LOG_DIFF["❌ Log: divergence<br/>+ diff details"]
 
    style Production fill:#dcfce7,stroke:#22c55e
    style Shadow fill:#fef3c7,stroke:#f59e0b
    style LOG_DIFF fill:#fee2e2,stroke:#ef4444
Shadow mode execution — comparing old and new code paths

Shadow mode runs for a configurable period (typically 24–72 hours). The results are aggregated:

  • 100% match rate: Safe to auto-merge
  • >99.9% match rate: Investigate the mismatches, usually edge cases. May still auto-merge.
  • <99.9% match rate: Flagged for manual review. The codemod likely has a bug.

How Stripe Handles the Remaining 10% — Human-Reviewed PRs

Even with automation, roughly 10% of PRs require human review. Minions optimizes this process too:

  1. Smart reviewer assignment: Uses CODEOWNERS plus historical review data. If Alice reviewed the last 5 Minions PRs for this service and approved quickly, she gets assigned again.
  2. Pre-answered review questions: The PR description anticipates common reviewer questions ("Is this backwards compatible?" "What happens to in-flight requests?") and answers them upfront.
  3. Review deadline escalation: If a PR isn't reviewed within 48 hours, it escalates to the team lead. At 72 hours, it escalates to the engineering manager. Unreviewed PRs block the migration timeline.
  4. One-click approval: For Tier 2 changes where shadow tests pass, the reviewer sees a "Shadow test results: 100% match over 48 hours" badge. Most approve in under 2 minutes.

The Numbers

MetricValue
PRs generated per week1,300+
Auto-merge rate~90%
Median time: codemod authored → all PRs merged4 hours (mechanical), 48 hours (guarded)
Active codemods running simultaneously100+
Files modified per week10,000+
Engineering hours saved per quarter12,000+
Rollback rate<0.5%
Post-merge incidents caused by Minions~2 per quarter (out of 15,000+ PRs)

For Perspective

12,000 engineering hours per quarter is roughly $3–5 million in engineering salary at Bay Area rates. Minions likely pays for its dedicated team (estimated 8–12 engineers) many times over.

What Makes Minions Different from Open-Source Codemods?

You might wonder: can't you just use jscodeshift or Rubocop --auto-correct? Yes, but those tools solve 20% of the problem. The remaining 80% is the orchestration:

mermaid
graph LR
    subgraph Tool["Open-Source Codemod Tool"]
        T1["✅ Parse AST"]
        T2["✅ Apply transformation"]
        T3["✅ Write modified files"]
    end
 
    subgraph Platform["Stripe Minions Platform"]
        P1["✅ Parse AST"]
        P2["✅ Apply transformation"]
        P3["✅ Write modified files"]
        P4["✅ Batch by team ownership"]
        P5["✅ Create PRs with full context"]
        P6["✅ Run CI per PR"]
        P7["✅ Classify risk"]
        P8["✅ Auto-merge safe changes"]
        P9["✅ Progressive rollout"]
        P10["✅ Shadow testing"]
        P11["✅ Post-merge monitoring"]
        P12["✅ Automatic rollback"]
        P13["✅ Reviewer assignment"]
        P14["✅ Escalation workflows"]
        P15["✅ Migration progress dashboard"]
    end
 
    style Tool fill:#fef3c7,stroke:#f59e0b
    style Platform fill:#dcfce7,stroke:#22c55e
Codemod tool vs full migration platform

DevOps toolchain — Plan, Create, Verify, Package, Release, Configure, Monitor

Key Takeaways for Engineers

1. Invest in Developer Tooling as a Product

Stripe treats developer productivity as a first-class product with dedicated teams, roadmaps, and KPIs. Minions isn't a hack someone built in a hackathon — it's a production platform maintained by 8–12 engineers. The ROI is clear: every hour spent building Minions saves hundreds of engineering hours across the company.

How to apply this: Even at smaller companies, investing in a simple codemod framework (using tools like jscodeshift, JavaParser, or Rector) can save enormous time. Start with the most common migration pattern in your codebase.

2. AST-Based Transformations Beat Regex — Always

If you're doing code migrations with sed, grep, or regex, you'll hit edge cases immediately: strings, comments, similarly-named variables, multi-line expressions. AST-based tools understand code structure and produce correct transformations even for complex syntax.

How to apply this: Learn your language's AST tools:

  • Java: JavaParser, OpenRewrite, Error Prone
  • JavaScript/TypeScript: jscodeshift, ts-morph
  • Python: LibCST, Bowler
  • Ruby: Rubocop, parser gem
  • Go: go/ast + analysis packages

3. Safety Mechanisms Make Automation Possible

The only reason Stripe can auto-merge 1,300 PRs per week to a payment processing system is the multi-layered safety net: shadow testing, progressive rollout, automatic rollback, blast radius control. Without these, automated changes at this scale would be reckless.

How to apply this: Before automating any code change at scale, build the safety infrastructure first. At minimum:

  • Feature flags to gradually roll out
  • Monitoring that catches regressions within minutes
  • One-click (or automatic) rollback capability

4. Batch by Ownership, Not by Change

Minions groups changes by CODEOWNERS, not by the type of change. Each team gets one PR with all changes to their code. This respects ownership boundaries and makes reviews coherent — a reviewer sees all changes in their domain at once, not scattered fragments.

5. Automate the Boring Parts to Unlock the Interesting Ones

The most impactful automation isn't the flashy AI-powered system — it's automating the tedious, repetitive maintenance work that nobody wants to do. Code migrations, dependency upgrades, style enforcement, import sorting — these are perfect candidates. When engineers don't spend time on maintenance, they ship features faster.

Start Small, Think Big

You don't need Stripe's scale to benefit from this approach. Start with one codemod that automates your most painful recurring migration. Once the team sees the value, investment in tooling becomes an easy sell.

Free, no spam, unsubscribe anytime

Get every deep-dive in your inbox

New case studies, LLD patterns in Java, and HLD architectures on AWS — the full article delivered to you so you never miss a deep-dive.

Full case studies in emailJava + AWS ecosystemOne email per new article