What is High-Level Design?
2026-03-22awshlddistributed-systemsfundamentals
What is High-Level Design?
High-Level Design is the practice of choosing the right components, services, and infrastructure for a system and defining how they communicate. It's the architectural blueprint that determines whether your system will scale to millions of users or collapse under load.
When someone says "Design Twitter" or "Design a URL shortener" in an interview, they're asking for HLD — the bird's-eye view of how the entire system fits together.
graph TB
subgraph "High-Level Design Scope"
CLIENT["Client<br/>(Web/Mobile)"]
CDN["CDN<br/>(CloudFront)"]
LB["Load Balancer<br/>(ALB)"]
API1["API Server 1"]
API2["API Server 2"]
API3["API Server N"]
CACHE["Cache Layer<br/>(ElastiCache/Redis)"]
DB_PRIMARY["Primary DB<br/>(RDS PostgreSQL)"]
DB_REPLICA["Read Replica"]
QUEUE["Message Queue<br/>(SQS)"]
WORKER["Worker<br/>(Lambda)"]
STORAGE["Object Storage<br/>(S3)"]
SEARCH["Search<br/>(OpenSearch)"]
CLIENT --> CDN
CDN --> LB
LB --> API1
LB --> API2
LB --> API3
API1 --> CACHE
API2 --> CACHE
CACHE --> DB_PRIMARY
DB_PRIMARY --> DB_REPLICA
API1 --> QUEUE
QUEUE --> WORKER
WORKER --> STORAGE
API3 --> SEARCH
SEARCH --> DB_REPLICA
end
style CLIENT fill:#f1f5f9,stroke:#94a3b8
style CDN fill:#dbeafe,stroke:#3b82f6
style LB fill:#dbeafe,stroke:#3b82f6
style CACHE fill:#fef3c7,stroke:#f59e0b
style DB_PRIMARY fill:#dcfce7,stroke:#22c55e
style DB_REPLICA fill:#dcfce7,stroke:#22c55e
style QUEUE fill:#f3e8ff,stroke:#a855f7
style WORKER fill:#f3e8ff,stroke:#a855f7
style STORAGE fill:#fce7f3,stroke:#ec4899
style SEARCH fill:#fce7f3,stroke:#ec4899Every box in this diagram is an HLD decision. Why ALB instead of NLB? Why Redis instead of Memcached? Why SQS instead of Kafka? HLD is about understanding these trade-offs and making the right choice for your specific requirements.
HLD vs LLD — The Full Picture
graph TB
subgraph SYSTEM["The System"]
subgraph HLD_SCOPE["HLD Scope"]
direction LR
C["Client"]
S1["Service A"]
S2["Service B"]
DB["Database"]
Q["Queue"]
C --> S1
S1 --> S2
S1 --> DB
S2 --> Q
end
subgraph LLD_SCOPE["LLD Scope (inside Service A)"]
CTRL["Controller"]
SVC["Service Layer"]
REPO["Repository"]
MODEL["Domain Models"]
PATTERN["Design Patterns"]
CTRL --> SVC
SVC --> REPO
SVC --> MODEL
SVC --> PATTERN
end
end
S1 -.->|"Zoom in"| LLD_SCOPE
style HLD_SCOPE fill:#dbeafe,stroke:#3b82f6
style LLD_SCOPE fill:#dcfce7,stroke:#22c55e| Aspect | HLD | LLD |
|---|---|---|
| Question | What components do we need? How do they talk? | How do we implement one component internally? |
| Decisions | SQL vs NoSQL, REST vs gRPC, sync vs async | Which design pattern? Interface or abstract class? |
| Failure mode | "The system can't handle 10K requests/sec" | "Adding a new payment method requires changing 15 files" |
| Diagram type | Architecture diagram, data flow diagram | Class diagram, sequence diagram |
| AWS mapping | EC2, RDS, SQS, ElastiCache, CloudFront | Spring Boot, JPA, Design Patterns |
| Interview | "Design Instagram" (whiteboard) | "Design a parking lot" (code) |
The Building Blocks of Every Distributed System
Every system, from a startup's MVP to Netflix's global infrastructure, is assembled from these building blocks. In this series, we map every concept to AWS services.
1. Compute — Where Does Your Code Run?
graph LR
subgraph "More Control ←→ Less Management"
EC2["EC2<br/>Full server<br/>You manage everything"]
ECS["ECS/Fargate<br/>Containers<br/>AWS manages servers"]
LAMBDA["Lambda<br/>Functions<br/>AWS manages everything"]
end
EC2 -->|"Need OS-level<br/>access, GPUs"| EC2
ECS -->|"Microservices,<br/>long-running tasks"| ECS
LAMBDA -->|"Event-driven,<br/>short tasks"| LAMBDA
style EC2 fill:#fee2e2,stroke:#ef4444
style ECS fill:#fef3c7,stroke:#f59e0b
style LAMBDA fill:#dcfce7,stroke:#22c55e| Service | When to Use | Cost Model | Max Execution |
|---|---|---|---|
| EC2 | Full control needed, GPU workloads, legacy apps | Per hour (running) | Unlimited |
| ECS/Fargate | Microservices, Docker containers, consistent workloads | Per vCPU + memory (running) | Unlimited |
| Lambda | Event handlers, API endpoints, async processing | Per request + duration | 15 minutes |
Rule of Thumb
Start with Lambda for new services. Move to Fargate when you need long-running processes or consistent throughput. Use EC2 only when you need OS-level access or specific hardware.
2. Storage — How Do You Persist Data?
Choosing the right database is the most impactful HLD decision you'll make. There is no "best" database — only the best database for your access pattern.
flowchart TD
START["What's your data like?"]
Q1{"Need complex<br/>joins and<br/>transactions?"}
Q2{"Need flexible<br/>schema?"}
Q3{"Need sub-ms<br/>reads?"}
Q4{"Need full-text<br/>search?"}
Q5{"Need to store<br/>files/images?"}
Q6{"Need time-series<br/>data?"}
RDS["RDS (PostgreSQL/MySQL)<br/>Relational, ACID,<br/>strong consistency"]
DYNAMO["DynamoDB<br/>Key-value/document,<br/>single-digit ms at any scale"]
REDIS["ElastiCache (Redis)<br/>In-memory, sub-ms,<br/>TTL-based expiry"]
SEARCH["OpenSearch<br/>Full-text search,<br/>fuzzy matching, analytics"]
S3["S3<br/>Object storage,<br/>unlimited scale, $0.023/GB"]
TIMESTREAM["Timestream<br/>Purpose-built for<br/>IoT/metrics/logs"]
START --> Q1
Q1 -->|Yes| RDS
Q1 -->|No| Q2
Q2 -->|Yes| DYNAMO
Q2 -->|No| Q3
Q3 -->|Yes| REDIS
Q3 -->|No| Q4
Q4 -->|Yes| SEARCH
Q4 -->|No| Q5
Q5 -->|Yes| S3
Q5 -->|No| Q6
Q6 -->|Yes| TIMESTREAM
style RDS fill:#dcfce7,stroke:#22c55e
style DYNAMO fill:#dbeafe,stroke:#3b82f6
style REDIS fill:#fef3c7,stroke:#f59e0b
style SEARCH fill:#f3e8ff,stroke:#a855f7
style S3 fill:#fce7f3,stroke:#ec4899
style TIMESTREAM fill:#e0e7ff,stroke:#6366f1| Database | Type | Consistency | Latency | Scale | Cost |
|---|---|---|---|---|---|
| RDS (PostgreSQL) | Relational | Strong (ACID) | ~5ms | Vertical + read replicas | $$$ |
| DynamoDB | Key-value/Document | Eventual (configurable strong) | ~5ms | Horizontal (unlimited) | $ per request |
| ElastiCache (Redis) | In-memory | Eventual | <1ms | Clustered | $$ per node |
| S3 | Object store | Strong (as of 2020) | ~100ms | Unlimited | $ per GB |
| OpenSearch | Search engine | Near real-time | ~50ms | Horizontal | $$$ per node |
3. Networking — How Do Components Communicate?
graph TB
subgraph Sync["Synchronous (request-response)"]
C1["Client"]
API_GW["API Gateway"]
SVC1["Service"]
C1 -->|"HTTP request"| API_GW
API_GW -->|"Forward"| SVC1
SVC1 -->|"HTTP response"| API_GW
API_GW -->|"Response"| C1
end
subgraph Async["Asynchronous (fire-and-forget)"]
SVC2["Producer<br/>Service"]
SQS["SQS Queue"]
CONSUMER["Consumer<br/>Service"]
SVC2 -->|"Send message"| SQS
SQS -->|"Poll message"| CONSUMER
Note1["Producer doesn't wait<br/>for consumer to finish"]
end
subgraph Event["Event-Driven (pub-sub)"]
PUB["Publisher"]
SNS["SNS Topic"]
SUB1["Subscriber 1<br/>(Email)"]
SUB2["Subscriber 2<br/>(Analytics)"]
SUB3["Subscriber 3<br/>(Notification)"]
PUB -->|"Publish event"| SNS
SNS -->|"Fan out"| SUB1
SNS -->|"Fan out"| SUB2
SNS -->|"Fan out"| SUB3
end
style Sync fill:#dbeafe,stroke:#3b82f6
style Async fill:#dcfce7,stroke:#22c55e
style Event fill:#fef3c7,stroke:#f59e0b| Pattern | AWS Service | When to Use | Trade-off |
|---|---|---|---|
| Synchronous | API Gateway, ALB | User-facing APIs, need immediate response | Tight coupling, cascading failures |
| Async (queue) | SQS | Background jobs, email sending, order processing | Eventual consistency, harder to debug |
| Pub-Sub (events) | SNS, EventBridge | Fan-out notifications, event-driven architectures | Message ordering not guaranteed, at-least-once delivery |
| Streaming | Kinesis, MSK (Kafka) | Real-time analytics, log aggregation, CDC | Complex to operate, expensive at scale |
The Golden Rule of Communication
Use synchronous communication only when the caller needs the result immediately. Everything else should be async. This single decision prevents more outages than any other architectural choice.
4. Caching — The Single Biggest Performance Lever
Caching is the most effective way to improve system performance. A well-placed cache can reduce database load by 90% and cut response times from 100ms to 1ms.
flowchart LR
USER["User Request"]
subgraph L1["Layer 1: Edge Cache"]
CF["CloudFront CDN<br/>Static assets, API responses<br/>Global, ~10ms"]
end
subgraph L2["Layer 2: Application Cache"]
REDIS2["ElastiCache Redis<br/>Session data, hot queries<br/>Regional, ~1ms"]
end
subgraph L3["Layer 3: Database Cache"]
DAX["DAX<br/>(DynamoDB Accelerator)<br/>Table-level, ~μs"]
end
subgraph L4["Layer 4: Database"]
DB2["RDS / DynamoDB<br/>Source of truth<br/>~5-50ms"]
end
USER --> CF
CF -->|"Cache MISS"| REDIS2
REDIS2 -->|"Cache MISS"| DAX
DAX -->|"Cache MISS"| DB2
style L1 fill:#dbeafe,stroke:#3b82f6
style L2 fill:#fef3c7,stroke:#f59e0b
style L3 fill:#dcfce7,stroke:#22c55e
style L4 fill:#f1f5f9,stroke:#94a3b8Cache Invalidation Strategies
| Strategy | How It Works | Best For |
|---|---|---|
| TTL (Time-to-Live) | Cache entry expires after N seconds | Data that's acceptable to be slightly stale (product catalog, user profiles) |
| Write-Through | Every write goes to cache AND database simultaneously | Data that must always be fresh (account balance, inventory count) |
| Write-Behind | Write to cache immediately, flush to database asynchronously | High-write workloads where slight lag is acceptable (analytics, metrics) |
| Cache-Aside (Lazy Loading) | Application checks cache first, loads from DB on miss, stores in cache | General purpose — most common pattern |
// Cache-Aside pattern in Java (most common)
public class UserService {
private final RedisTemplate<String, User> cache;
private final UserRepository repository;
public User getUser(String userId) {
String key = "user:" + userId;
User cached = cache.opsForValue().get(key);
if (cached != null) {
return cached; // cache HIT — sub-millisecond
}
User user = repository.findById(userId); // cache MISS — hit database
cache.opsForValue().set(key, user, Duration.ofMinutes(15));
return user;
}
public void updateUser(User user) {
repository.save(user);
cache.delete("user:" + user.getId()); // invalidate stale cache
}
}A Complete Example: Scaling a Java Web App on AWS
Let's walk through how a real application evolves from a single server to a production-grade distributed system. Each step addresses a specific bottleneck.
Stage 1: The Monolith
Everything on one EC2 instance. Works for ~100 concurrent users.
graph LR
USER["Users<br/>(~100 concurrent)"]
EC2["EC2 Instance<br/>Java App + PostgreSQL"]
USER --> EC2
style EC2 fill:#fee2e2,stroke:#ef4444Bottleneck: The app and database compete for CPU and memory on the same machine. A traffic spike kills both.
Stage 2: Separate the Database
Decouple compute from storage. Now they scale independently.
graph LR
USER["Users"]
EC2_2["EC2<br/>Java App"]
RDS["RDS PostgreSQL<br/>Multi-AZ, automated<br/>backups, failover"]
USER --> EC2_2 --> RDS
style RDS fill:#dcfce7,stroke:#22c55eGain: Database gets automated backups, failover, and can scale storage independently. App server can be resized without affecting the database.
Bottleneck: Single app server. If it dies, the entire system is down.
Stage 3: Load Balancing + Auto Scaling
Multiple app servers behind a load balancer. No single point of failure.
graph LR
USER2["Users<br/>(~10K concurrent)"]
ALB["ALB<br/>Application<br/>Load Balancer"]
ASG["Auto Scaling Group"]
EC2A["EC2 (App)"]
EC2B["EC2 (App)"]
EC2C["EC2 (App)"]
RDS2["RDS PostgreSQL<br/>+ Read Replica"]
USER2 --> ALB
ALB --> ASG
ASG --> EC2A
ASG --> EC2B
ASG --> EC2C
EC2A --> RDS2
EC2B --> RDS2
EC2C --> RDS2
style ALB fill:#dbeafe,stroke:#3b82f6
style ASG fill:#fef3c7,stroke:#f59e0bKey decisions:
- ALB (not NLB) because we need HTTP-level routing (path-based, host-based)
- Auto Scaling Group scales EC2 instances based on CPU or request count
- Read Replica handles read-heavy queries (product listings, search results)
Bottleneck: Every request hits the database. At 10K req/sec, the database becomes the bottleneck.
Stage 4: Add Caching
Redis absorbs 80–90% of read traffic. Database only handles writes and cache misses.
graph TB
USER3["Users<br/>(~100K concurrent)"]
CF["CloudFront CDN<br/>Static assets cached<br/>at 400+ edge locations"]
ALB2["ALB"]
APP["EC2 Fleet<br/>(Auto Scaling)"]
REDIS3["ElastiCache Redis<br/>Cluster mode,<br/>6 nodes, 3 shards"]
RDS3["RDS PostgreSQL<br/>Multi-AZ + 2 Read Replicas"]
USER3 --> CF
CF --> ALB2
ALB2 --> APP
APP -->|"80% cache HIT"| REDIS3
APP -->|"20% cache MISS"| RDS3
REDIS3 -.->|"Load on miss"| RDS3
style CF fill:#dbeafe,stroke:#3b82f6
style REDIS3 fill:#fef3c7,stroke:#f59e0b
style RDS3 fill:#dcfce7,stroke:#22c55eImpact: Database queries drop from 100K/sec to 10K/sec. P99 latency drops from 200ms to 15ms.
Bottleneck: Synchronous processing. Sending a confirmation email during checkout adds 2 seconds to the response.
Stage 5: Async Processing
Move non-critical work to background queues. The user gets an instant response.
graph TB
USER4["Users"]
ALB3["ALB"]
APP2["EC2 Fleet"]
REDIS4["Redis Cache"]
RDS4["RDS"]
subgraph Async["Async Pipeline"]
SQS2["SQS Queue"]
LAMBDA2["Lambda Workers"]
SES["SES (Email)"]
S3_2["S3 (Files)"]
ANALYTICS["Analytics<br/>Pipeline"]
end
USER4 --> ALB3 --> APP2
APP2 --> REDIS4 --> RDS4
APP2 -->|"Fire-and-forget"| SQS2
SQS2 --> LAMBDA2
LAMBDA2 --> SES
LAMBDA2 --> S3_2
LAMBDA2 --> ANALYTICS
style Async fill:#f3e8ff,stroke:#a855f7What goes async: Confirmation emails, invoice generation, image processing, analytics events, search index updates, notification pushes.
Result: Checkout API responds in 200ms instead of 2.5 seconds. Background work completes within minutes.
Architecture Summary
graph LR
S1["Stage 1<br/>Single Server<br/>~100 users"]
S2["Stage 2<br/>Managed DB<br/>~1K users"]
S3["Stage 3<br/>Load Balanced<br/>~10K users"]
S4["Stage 4<br/>Cached<br/>~100K users"]
S5["Stage 5<br/>Async<br/>~1M users"]
S1 -->|"Separate DB"| S2
S2 -->|"Add LB + ASG"| S3
S3 -->|"Add Redis + CDN"| S4
S4 -->|"Add SQS + Lambda"| S5
style S1 fill:#fee2e2,stroke:#ef4444
style S5 fill:#dcfce7,stroke:#22c55eKey Insight
You don't start with Stage 5. Each architectural decision adds complexity and cost. You evolve the architecture when a specific bottleneck appears — not before. Premature optimization is the root of all evil in HLD.
Core Concepts You Must Know
Every HLD interview will test your understanding of these fundamental concepts:
CAP Theorem
In a distributed system, you can only guarantee two out of three:
graph TD
C["Consistency<br/>Every read gets the<br/>latest write"]
A["Availability<br/>Every request gets<br/>a response"]
P["Partition Tolerance<br/>System works despite<br/>network failures"]
C --- A
A --- P
P --- C
CP["CP Systems<br/>MongoDB, HBase, Redis<br/>(sacrifice availability)"]
AP["AP Systems<br/>DynamoDB, Cassandra<br/>(sacrifice consistency)"]
CA["CA Systems<br/>Traditional RDBMS<br/>(can't handle partitions)"]
C -.-> CP
P -.-> CP
A -.-> AP
P -.-> AP
C -.-> CA
A -.-> CA
style C fill:#dbeafe,stroke:#3b82f6
style A fill:#dcfce7,stroke:#22c55e
style P fill:#fef3c7,stroke:#f59e0bIn practice, network partitions will happen in any distributed system. So the real choice is between CP (consistent but sometimes unavailable) and AP (always available but sometimes stale).
| Use Case | Choose | Why |
|---|---|---|
| Bank transactions | CP (RDS) | A stale balance could cause overdrafts |
| Social media feed | AP (DynamoDB) | Seeing a post 2 seconds late is fine |
| Shopping cart | AP (DynamoDB) | Availability > consistency for user experience |
| Inventory count | CP (RDS) | Overselling is worse than temporary unavailability |
Horizontal vs Vertical Scaling
graph TB
subgraph Vertical["Vertical Scaling (Scale Up)"]
V1["4 CPU, 16GB RAM<br/>$100/month"]
V2["16 CPU, 64GB RAM<br/>$400/month"]
V3["64 CPU, 256GB RAM<br/>$1,600/month"]
V1 -->|"Upgrade"| V2 -->|"Upgrade"| V3
V4["⚠️ Hardware limit<br/>~448 vCPU max on AWS"]
end
subgraph Horizontal["Horizontal Scaling (Scale Out)"]
H1["Instance 1<br/>4 CPU"]
H2["Instance 2<br/>4 CPU"]
H3["Instance 3<br/>4 CPU"]
H4["Instance N<br/>4 CPU"]
HN["✅ No limit<br/>Add more instances"]
end
style Vertical fill:#fef3c7,stroke:#f59e0b
style Horizontal fill:#dcfce7,stroke:#22c55e| Vertical | Horizontal | |
|---|---|---|
| Approach | Bigger machine | More machines |
| Limit | Hardware ceiling | Theoretically unlimited |
| Downtime | Yes (restart to resize) | No (add instances live) |
| Complexity | Low (same architecture) | High (need load balancer, stateless design) |
| Cost curve | Exponential (2x CPU ≠ 2x cost) | Linear (2x instances ≈ 2x cost) |
| Best for | Databases, legacy apps | Stateless web/API servers |
What You'll Learn in This Series
This series covers HLD from the ground up, with every concept mapped to AWS services.
graph LR
C0["Class 0<br/>What is HLD<br/>(you are here)"]
C1["Class 1<br/>Scalability<br/>Fundamentals"]
C2["Class 2<br/>Load Balancing<br/>& API Design"]
C3["Class 3-5<br/>Databases,<br/>Caching, Queues"]
C4["Class 6-8<br/>Architecture Patterns<br/>(Microservices, CQRS,<br/>Event-Driven)"]
C5["Class 9+<br/>HLD Problems<br/>(URL Shortener,<br/>Chat, News Feed)"]
C0 --> C1 --> C2 --> C3 --> C4 --> C5
style C0 fill:#3b82f6,stroke:#2563eb,color:#fff
style C5 fill:#22c55e,stroke:#16a34a,color:#fffScalability Fundamentals (Class 1)
Vertical vs horizontal scaling, stateless services, session management, AWS Auto Scaling Groups, and how to design services that can scale to millions of requests.
Load Balancing & API Design (Class 2)
ALB vs NLB vs API Gateway, routing strategies, rate limiting, API versioning, and REST vs gRPC vs GraphQL trade-offs.
Databases, Caching & Queues (Classes 3–5)
SQL vs NoSQL deep dive, sharding strategies, replication, consistent hashing, Redis patterns, SQS vs Kafka, and when to use each.
Architecture Patterns (Classes 6–8)
Microservices vs monolith, event-driven architecture, CQRS, saga pattern, circuit breaker, and service mesh — each with AWS implementation.
HLD Interview Problems (Classes 9+)
URL Shortener, Chat Application, News Feed, Notification Service, Video Streaming, Ride-Sharing — complete system designs with architecture diagrams, AWS service mappings, capacity estimation, and trade-off analysis.
How to Approach an HLD Interview
graph TD
subgraph Phase1["Minutes 0-5: Requirements"]
A1["Functional: What does the system DO?"]
A2["Non-functional: Scale, latency, availability"]
A3["Constraints: Budget, team size, timeline"]
end
subgraph Phase2["Minutes 5-10: Estimation"]
B1["Users: DAU, peak concurrent"]
B2["Traffic: Reads/sec, writes/sec"]
B3["Storage: Data size, growth rate"]
B4["Bandwidth: Upload/download"]
end
subgraph Phase3["Minutes 10-25: Core Design"]
C1["Draw high-level architecture"]
C2["Define API contracts"]
C3["Choose database + schema"]
C4["Define data flow"]
end
subgraph Phase4["Minutes 25-40: Deep Dive"]
D1["Address bottlenecks"]
D2["Add caching strategy"]
D3["Design for failure"]
D4["Scale specific components"]
end
subgraph Phase5["Minutes 40-45: Trade-offs"]
E1["What are the compromises?"]
E2["What breaks at 100x scale?"]
E3["What would you change with more time?"]
end
Phase1 --> Phase2 --> Phase3 --> Phase4 --> Phase5
style Phase1 fill:#dbeafe,stroke:#3b82f6
style Phase2 fill:#dcfce7,stroke:#22c55e
style Phase3 fill:#fef3c7,stroke:#f59e0b
style Phase4 fill:#f3e8ff,stroke:#a855f7
style Phase5 fill:#fce7f3,stroke:#ec4899The Interviewer's Secret
HLD interviewers don't care about the "right" answer — there isn't one. They care about your thought process: Do you ask clarifying questions? Do you consider trade-offs? Can you identify bottlenecks? Can you evolve a design as requirements change?
What's Next
In the next class, we'll cover Scalability Fundamentals — vertical vs horizontal scaling, stateless services, session management, and how AWS Auto Scaling Groups work under the hood.