System Design Interview Questions 2026, Top 40 Questions with Answers

Every FAANG company now asks at least 2 system design questions. Senior engineers report that system design is the #1 reason for offer/no-offer decisions at the L5+ level. In 2026, the bar has risen further: interviewers expect you to reason about AI/ML infrastructure, real-time streaming, global scale, and observability, not just databases and load balancers. This guide covers the exact 40 designs top companies rotate through, with step-by-step solutions, capacity math, and the specific talking points that get you hired.

The difference between a "hire" and "strong hire" in system design? Depth. Anyone can draw boxes. This guide teaches you to go deep on the algorithms, tradeoffs, and failure modes that impress interviewers.

Related articles: AI/ML Interview Questions 2026 | Data Engineering Interview Questions 2026 | Prompt Engineering Interview Questions 2026 | AWS Interview Questions 2026

The Proven Framework to Ace System Design Interviews

Every top candidate uses a structured framework. Winging it is the fastest way to fail. Here's the exact approach used by engineers who've passed system design at Google, Meta, and Amazon:

1. Clarify Requirements (5 min)
   - Functional requirements (what the system does)
   - Non-functional requirements (scale, latency, availability)
   - Out-of-scope (what you won't design today)

2. Capacity Estimation (5 min)
   - DAU, QPS, storage, bandwidth
   - Back-of-envelope math

3. High-Level Design (10 min)
   - Components: clients, API servers, databases, caches, queues
   - Data flow for key user journeys

4. Deep Dive (15 min)
   - Database schema
   - API design
   - Critical algorithms
   - Bottlenecks and solutions

5. Identify Failure Points (5 min)
   - What fails under load?
   - How do you recover?
   - Monitoring and alerts

EASY, Foundational Designs (Questions 1-10)

These are the "warm-up" designs that companies use to calibrate your level. Nail them quickly and you earn more time for the harder deep-dive questions.

Q1. Design a URL Shortener (like bit.ly)

Which companies ask this: Google, Amazon, Twilio, Bitly, Snap

Clarify requirements:

Shorten a URL → 7-char short code
Redirect short URL → original URL
Analytics: click count, geographic data
100M URLs created/day, 10B redirects/day

Capacity estimation:

Write QPS: 100M / 86,400 = 1,157 writes/sec ≈ 1.2K/sec
Read QPS: 10B / 86,400 = 115,740 reads/sec ≈ 116K/sec  (read:write = 100:1)
Storage: 100M * 365 * 5 years = 182.5B URLs
         Each URL record ~500 bytes → 182.5B * 500 = 91TB total

High-level design:

[Client] → [API Gateway / Load Balancer]
                    ↓
         [URL Service (stateless, horizontally scaled)]
                    ↓
         [Cache (Redis)] ← hot URLs cached here
                    ↓ cache miss
         [Database (PostgreSQL + Read Replicas)]
                    ↓ async
         [Analytics Service (Kafka → ClickHouse)]

Database schema:

CREATE TABLE url_mappings (
    short_code   CHAR(7)       PRIMARY KEY,
    long_url     TEXT          NOT NULL,
    user_id      BIGINT,
    created_at   TIMESTAMP     DEFAULT NOW(),
    expires_at   TIMESTAMP,
    click_count  BIGINT        DEFAULT 0,
    INDEX (user_id),
    INDEX (created_at)
);

Short code generation (key algorithm):

Approach	Pros	Cons
MD5/SHA256 + take 7 chars	Simple	Collisions possible
Base62 encode auto-increment ID	Guaranteed unique	Predictable URLs
Nanoid / random	Unpredictable	Collision check needed
Range-based ID (ticket server)	Globally unique, distributed	Single point failure if not replicated

Best approach, Base62 of unique ID:

import string
CHARS = string.ascii_letters + string.digits  # 62 chars

def encode_base62(num):
    result = []
    while num:
        result.append(CHARS[num % 62])
        num //= 62
    return ''.join(reversed(result)).zfill(7)

def decode_base62(short):
    return sum(CHARS.index(c) * (62 ** i) for i, c in enumerate(reversed(short)))

# ID 1000000 → "4c92" in base62 (7 chars supports up to 62^7 = 3.5 trillion URLs)

Redirect flow:

GET /abc1234
1. Check Redis cache (TTL = 24h): if hit → 301/302 redirect → done
2. Cache miss → query PostgreSQL
3. Log click event → Kafka topic "clicks"
4. Warm Redis cache
5. Return 302 (temporary, forces re-check) or 301 (permanent, client caches)

301 vs 302: 301 reduces server load (client caches). 302 lets you update destinations and track analytics accurately.

Q2. Design a Rate Limiter

Which companies ask this: Stripe, Twilio, Cloudflare, all API platform companies

Requirements:

Limit API calls per user/IP
Multiple tiers: 1000 req/min for free, 10K for pro
Distributed (multiple API servers)
Fast: rate limit check must add < 5ms latency

Algorithms:

Algorithm	Description	Pros	Cons
Token Bucket	Bucket fills at rate r, each req takes 1 token	Smooth, allows bursts	Distributed state
Leaky Bucket	Fixed output rate, queue excess	Consistent output rate	Drops bursts
Fixed Window Counter	Count per minute window	Simple	Boundary issue (2x burst at window edge)
Sliding Window Log	Track timestamps of all requests	Accurate	Memory intensive
Sliding Window Counter	Interpolate between fixed windows	Accurate, low memory	Approximate

Implementation (Redis Sliding Window Counter):

import redis
import time

r = redis.Redis()

def is_allowed(user_id: str, limit: int, window_seconds: int = 60) -> bool:
    now = time.time()
    window_start = now - window_seconds

    pipe = r.pipeline()
    key = f"ratelimit:{user_id}"

    # Remove entries outside window
    pipe.zremrangebyscore(key, 0, window_start)
    # Count entries in window
    pipe.zcard(key)
    # Add current request
    pipe.zadd(key, {str(now): now})
    # Set expiry
    pipe.expire(key, window_seconds * 2)

    results = pipe.execute()
    current_count = results[1]

    if current_count >= limit:
        return False  # Rate limited
    return True

Token bucket with Redis:

def token_bucket_allow(user_id, rate=100, capacity=200):
    """rate: tokens/sec, capacity: max burst"""
    key = f"tokens:{user_id}"
    now = time.time()
    lua = """
    local key = KEYS[1]
    local rate = tonumber(ARGV[1])
    local capacity = tonumber(ARGV[2])
    local now = tonumber(ARGV[3])
    local data = redis.call('HMGET', key, 'tokens', 'last_refill')
    local tokens = tonumber(data[1]) or capacity
    local last_refill = tonumber(data[2]) or now
    local elapsed = now - last_refill
    tokens = math.min(capacity, tokens + elapsed * rate)
    if tokens >= 1 then
        tokens = tokens - 1
        redis.call('HMSET', key, 'tokens', tokens, 'last_refill', now)
        return 1
    end
    return 0
    """
    return r.eval(lua, 1, key, rate, capacity, now) == 1

Distributed design:

[Request] → [API Gateway]
                ↓
         Check [Redis Cluster] (centralized rate state)
                ↓ allowed
         [Backend Service]

Return headers: X-RateLimit-Limit: 1000, X-RateLimit-Remaining: 42, X-RateLimit-Reset: 1710000000

Q3. Design a Key-Value Store (like Redis)

Requirements:

GET, SET, DELETE operations
TTL support
< 1ms latency
High availability

Core data structures:

In-memory hash table: O(1) GET/SET
+ Eviction policy: LRU (Least Recently Used)
+ Persistence: AOF (Append-Only File) or RDB (snapshotting)
+ Replication: Leader-Follower async replication
+ Clustering: Hash slots (Redis Cluster: 16,384 slots)

Memory layout:

key → [value | type | TTL | last_access_time]
Types: String, List (doubly-linked), Hash (ziplist or hashtable), Set, Sorted Set (skiplist)

Eviction (when memory full):

volatile-lru: LRU among keys with TTL set
allkeys-lru: LRU among all keys (most common in caching)
allkeys-lfu: LFU (Least Frequently Used), better for Zipf distributions

Requirements: Store text/code snippets; shareable via unique link; optional expiry; 100K writes/day, 10M reads/day

Key design choices:

Content stored in object storage (S3), metadata in DB
Short link generation: same as URL shortener
Read path: CDN cache → S3 → return

Schema:
pastes(paste_id CHAR(8) PK, user_id BIGINT, title VARCHAR(200),
       s3_key VARCHAR(500), language VARCHAR(50), size_bytes INT,
       created_at TIMESTAMP, expires_at TIMESTAMP, views BIGINT)

Q5. Design a Leaderboard System

Requirements: Real-time rankings for 10M users; top-100 leaderboard; user's own rank; updates from game events

Solution: Redis Sorted Sets

import redis
r = redis.Redis()

# Add/update score
r.zadd("leaderboard:global", {f"user:{user_id}": score})

# Get top 100
top100 = r.zrevrange("leaderboard:global", 0, 99, withscores=True)

# Get user's rank (0-indexed from top)
rank = r.zrevrank("leaderboard:global", f"user:{user_id}")

# Get users around a rank (for "users near me" feature)
start = max(0, rank - 5)
neighbors = r.zrevrange("leaderboard:global", start, rank + 5, withscores=True)

For 10M users: Redis Sorted Set uses O(log n) for all operations. 10M entries × ~50 bytes = 500MB, fits in Redis memory. Scale with sharding by game/region.

Q6. Design a Distributed Message Queue (like Kafka)

Requirements: Publish-subscribe; at-least-once delivery; replay messages; 1M messages/sec throughput

Core concepts:

Topics: logical message categories
Partitions: parallelism unit within a topic
Offsets: sequential position of message in partition
Consumer Groups: multiple consumers sharing partitions
Retention: keep messages for N days (not deleted on consume)

Architecture:

[Producers] → [Brokers (Kafka cluster, 3+ nodes)]
                         ↓
              [Topic: orders (3 partitions)]
              Partition 0: [msg0, msg1, msg2, ...]  → Consumer Group A: Consumer 1
              Partition 1: [msg0, msg1, ...]         → Consumer Group A: Consumer 2
              Partition 2: [msg0, msg1, ...]         → Consumer Group A: Consumer 3
                         ↓
              [ZooKeeper / KRaft] (cluster metadata)

Why Kafka is fast:

Sequential disk writes (OS page cache, 100x faster than random)
Zero-copy data transfer (sendfile() system call)
Batch compression
Consumer pull model (brokers don't track consumer state)

Q7. Design a Web Crawler

Requirements: Crawl the entire web; 15B pages; politeness (respect robots.txt, rate limits); deduplicate

Architecture:

[Seed URLs] → [URL Frontier (priority queue)]
                        ↓
             [Fetcher Workers (100s of machines)]
                        ↓
             [HTML Parser → extract links]
                        ↓
             [URL Seen Filter (Bloom Filter)]
                        ↓ new URLs
             [URL Frontier] (cycle)
                        ↓ parsed content
             [Content Store (S3)]
             [Document Store (Elasticsearch)]

Key algorithm, Bloom Filter for deduplication:

from bitarray import bitarray
import hashlib

class BloomFilter:
    def __init__(self, size=10**9, num_hashes=7):
        self.bits = bitarray(size)
        self.bits.setall(0)
        self.size = size
        self.k = num_hashes

    def add(self, url):
        for seed in range(self.k):
            h = int(hashlib.sha256(f"{seed}:{url}".encode()).hexdigest(), 16)
            self.bits[h % self.size] = 1

    def might_contain(self, url):
        return all(
            self.bits[int(hashlib.sha256(f"{seed}:{url}".encode()).hexdigest(), 16) % self.size]
            for seed in range(self.k)
        )
# False positive rate: ~0.1% with these settings

Politeness: Per-domain rate limiter. One queue per domain. Parse robots.txt and respect Crawl-delay.

Q8. Design a Type-Ahead / Autocomplete System

Requirements: Suggest completions as user types; top-10 suggestions; < 100ms; 10K queries/second

Two approaches:

Approach 1, Trie (for complete customization):

Trie stores: prefix → (list of top suggestions with scores)
Build offline, update daily from query logs
Serve from in-memory trie on each search server
No DB lookups at query time

Approach 2, Redis Sorted Sets (production simplicity):

# Precompute all prefixes in Redis
def add_suggestion(phrase, score):
    for i in range(1, len(phrase)+1):
        prefix = phrase[:i]
        redis.zadd(f"autocomplete:{prefix}", {phrase: score})

def suggest(prefix, limit=10):
    return redis.zrevrange(f"autocomplete:{prefix}", 0, limit-1, withscores=True)

# For 1M common searches × avg 8 chars per search = 8M Redis keys
# Each key has ≤10 items → manageable

Scale: Cache completions for common prefixes in memory. Background job recomputes from query logs daily.

Q9. Design a Distributed ID Generator (like Twitter Snowflake)

Requirements: Globally unique IDs; sortable by time; no single point of failure; ~100K IDs/sec

Snowflake ID structure (64-bit integer):

| 1 bit sign | 41 bits timestamp (ms) | 10 bits machine ID | 12 bits sequence |
                ~69 years from epoch      1024 machines        4096 IDs/ms/machine

import time
import threading

class Snowflake:
    EPOCH = 1288834974657  # Twitter epoch (Nov 4, 2010)
    MACHINE_ID_BITS = 10
    SEQUENCE_BITS = 12
    MAX_SEQUENCE = (1 << SEQUENCE_BITS) - 1  # 4095

    def __init__(self, machine_id):
        self.machine_id = machine_id & ((1 << self.MACHINE_ID_BITS) - 1)
        self.sequence = 0
        self.last_timestamp = -1
        self.lock = threading.Lock()

    def next_id(self):
        with self.lock:
            ts = int(time.time() * 1000) - self.EPOCH
            if ts == self.last_timestamp:
                self.sequence = (self.sequence + 1) & self.MAX_SEQUENCE
                if self.sequence == 0:
                    while ts <= self.last_timestamp:
                        ts = int(time.time() * 1000) - self.EPOCH
            else:
                self.sequence = 0
            self.last_timestamp = ts
            return (ts << 22) | (self.machine_id << 12) | self.sequence

Alternatives in 2026: ULIDv4 (128-bit, URL-safe, time-ordered), ULID, UUID v7 (time-ordered UUID standard).

Q10. Design a Search Engine

Brief overview, deep dive at senior level:

[Web Crawler] → [Document Store (S3)] → [Indexing Pipeline]
                                              ↓
                    [Inverted Index (Elasticsearch cluster)]
                    term → [(doc_id, tf, positions), ...]
                                              ↓
[User Query] → [Query Parser] → [Retrieval (BM25)] → [Reranking (BERT)] → [Results]
                                              ↓
                              [Feature extraction] → [Learning to Rank (LambdaMART)]

Key algorithms: TF-IDF/BM25 for retrieval, PageRank for authority, LTR for final ranking.

MEDIUM, Complex Systems (Questions 11-27)

This is where senior offers are won or lost. These designs require reasoning about distributed systems, consistency models, and real-world scale. Don't just memorize, understand the tradeoffs deeply enough to adapt on the fly.

Q11. Design YouTube

Which companies ask this: Google, Netflix, Meta, TikTok, Amazon (Prime Video)

Requirements:

Upload videos (max 10GB), process to multiple resolutions
Stream to 2B users, peak 10M concurrent streams
Search, recommendations, comments, likes

Capacity estimation:

Uploads: 500 hours of video uploaded per minute (YouTube stat)
         = 500*60 = 30,000 min/min → ~500 MB/min raw = 15 GB/sec upload
Storage: 30,000 min * 60 sec * 10 MB/sec (avg) * 365 days * 5 years = catastrophic
         Encode to 5 quality levels → 5x original but compressed heavily
         Keep only 360p/720p/1080p/4K → ~20 GB/hour of video stored
Views: 1B hours watched/day → 1B*3600/86400 = 41.7M concurrent viewers
       Each viewer: ~4 Mbps → 41.7M * 4 Mbps = 166 Tbps bandwidth
       → Must use CDN extensively

High-level architecture:

[Upload Client]
    ↓ multipart upload
[Upload Service] → [Raw Storage (GCS/S3)]
    ↓ message
[Transcoding Job Queue (Kafka)]
    ↓
[Video Processing Workers] (GPU workers)
    → Extract metadata, thumbnail
    → Transcode: 360p, 720p, 1080p, 4K, HDR
    → Store segments in [Chunked Storage (CDN Origins)]
    → Update [Video Metadata DB (Spanner/CockroachDB)]
    → Trigger [Search Index Update (Elasticsearch)]

[Playback]
[Client] → [CDN Edge] (90% of views served from CDN)
         → [Load Balancer] → [Video Service]
         → [Streaming Server (HLS/DASH)] → [CDN Origin] → chunks

Video processing pipeline detail:

Raw video → FFmpeg transcoding workers (each GPU handles 5-10 streams)
Output: HLS (HTTP Live Streaming) format
  master.m3u8 → links to:
    360p/playlist.m3u8 → segment_000.ts, segment_001.ts, ...
    720p/playlist.m3u8 → ...
    1080p/playlist.m3u8 → ...

Segment duration: 6-10 seconds
Client adaptively chooses quality (ABR - Adaptive Bitrate)

Database schema:

-- Video metadata
CREATE TABLE videos (
    video_id     VARCHAR(11) PRIMARY KEY,   -- YouTube-style ID
    uploader_id  BIGINT NOT NULL,
    title        VARCHAR(200),
    description  TEXT,
    duration_sec INT,
    status       ENUM('processing','active','removed'),
    view_count   BIGINT DEFAULT 0,
    like_count   BIGINT DEFAULT 0,
    created_at   TIMESTAMP DEFAULT NOW(),
    INDEX(uploader_id), INDEX(created_at)
);

-- View counts (high write volume — use Redis counter + periodic flush)
-- Likes (use separate table for user-video like state)
CREATE TABLE video_likes (
    user_id    BIGINT,
    video_id   VARCHAR(11),
    liked_at   TIMESTAMP DEFAULT NOW(),
    PRIMARY KEY (user_id, video_id)
);

Recommendation system (crucial detail):

User signals → [Feature Store] → [Two-Tower Model]
                                       ↓
                              [Candidate Generation]
                              (embed user, retrieve ~1000 similar videos)
                                       ↓
                              [Ranking Model]
                              (score each candidate: predicted watch time, CTR)
                                       ↓
                              [Re-ranking] (diversity, freshness)
                                       ↓
                              [Homepage feed]

Q12. Design WhatsApp

Which companies ask this: Meta, Signal, Telegram, LINE, WeChat (all messaging platforms)

Requirements:

1-to-1 and group messaging (up to 1024 members)
End-to-end encryption
Online presence (online/offline/last seen)
Message delivery receipts (sent/delivered/read)
65B messages/day (WhatsApp scale), 2B users

Capacity estimation:

Messages: 65B/day = 750K messages/sec peak
Each message: avg 100 bytes → 750K * 100B = 75 MB/sec write
Storage: 65B * 100 bytes = 6.5TB/day → ~2.4 PB/year
Connections: 2B users, ~20% active simultaneously = 400M concurrent WebSocket connections
             40 WebSocket servers × 10M connections each (high-end)

High-level architecture:

[Client A] ←→ [WebSocket/XMPP Gateway] ←→ [Message Router]
                                                  ↓
                                         [Message Service]
                                                  ↓
                                    [Message Queue (Kafka)]
                                          ↓         ↓
                               [Persistence]   [Push Notification]
                               [Cassandra]      [APNS/FCM]

WebSocket connection management:

Client maintains persistent WebSocket connection to a gateway server
Gateway server assignment: Consistent hashing on user_id → always same gateway
If gateway crashes: client reconnects, session transferred

Protocol: XMPP over WebSocket OR WhatsApp proprietary protocol (Noise Protocol)
Heartbeat: client pings every 30s to keep connection alive through NAT

Message flow:

A sends to B:
1. A → Gateway A → Message Service
2. Message Service writes to Cassandra (persistence)
3. If B is online: route through B's gateway (pub/sub via Redis)
4. If B offline: push notification via APNS/FCM + queue message
5. On receipt: delivery receipts flow back the same path

Message storage (Cassandra schema):

-- Optimized for "give me messages in conversation, ordered by time"
CREATE TABLE messages (
    conversation_id UUID,
    message_id      TIMEUUID,    -- Cassandra TIMEUUID: UUID v1, sortable by time
    sender_id       BIGINT,
    content         BLOB,        -- encrypted
    content_type    TINYINT,     -- 0=text, 1=image, 2=video
    status          TINYINT,     -- 0=sent, 1=delivered, 2=read
    PRIMARY KEY (conversation_id, message_id)
) WITH CLUSTERING ORDER BY (message_id DESC)
  AND default_time_to_live = 7776000;  -- 90 days retention

End-to-end encryption (Signal Protocol):

1. Each user generates: Identity key, Signed prekey, One-time prekeys
2. Published to key server
3. Sender fetches recipient's keys → performs X3DH key agreement → shared secret
4. Session established using Double Ratchet algorithm
5. Each message encrypted with unique key
6. Server never has plaintext

Online presence:

User connects → set Redis key "online:{user_id}" = 1 with TTL 60s
Heartbeat every 30s → refresh TTL
User disconnects / TTL expires → presence = offline
Store "last_seen" in PostgreSQL when TTL expires (tombstone event from Redis keyspace notification)

Q13. Design a Notification System

Which companies ask this: Airbnb, Uber, LinkedIn, Meta, any product company

Requirements:

Push notifications (iOS/Android), email, SMS
10M notifications/day; some time-sensitive, some batch
User preferences (opt-out per notification type)
Delivery receipts
Template system

Architecture:

[Event Sources: order placed, new follower, etc.]
        ↓ events
[Message Queue (Kafka)] ← guarantees durability
        ↓
[Notification Service]
    - Fetch user preferences
    - Apply templates
    - Route to correct channel
        ↓
[Channel Workers (separate for each)]
    ├── Push Worker → [APNS (iOS)] / [FCM (Android)]
    ├── Email Worker → [SendGrid / SES]
    └── SMS Worker → [Twilio / Vonage]
        ↓
[Delivery Tracking DB]
[Analytics Pipeline]

User preferences:

CREATE TABLE notification_preferences (
    user_id       BIGINT,
    notification_type VARCHAR(50),  -- 'new_follower', 'order_update', etc.
    channel       ENUM('push','email','sms','in-app'),
    enabled       BOOLEAN DEFAULT true,
    PRIMARY KEY (user_id, notification_type, channel)
);

Batching strategy:

Marketing/promotional: batch by user, send at optimal send time (user's local morning)
Transactional (order confirmed): immediate
Social (new like): digest every 2h if many likes

Retry logic:

MAX_RETRIES = 3
BACKOFF = [30, 300, 3600]  # seconds

def send_with_retry(notification, attempt=0):
    try:
        result = send_push(notification)
        if result.success:
            mark_delivered(notification.id)
    except (RateLimitError, TemporaryError) as e:
        if attempt < MAX_RETRIES:
            schedule_retry(notification, delay=BACKOFF[attempt])
        else:
            mark_failed(notification.id)
    except PermanentError:  # invalid device token
        deregister_device(notification.device_token)

Requirements: Match riders and drivers; real-time location tracking; dynamic pricing; ETA calculation; 1M rides/day

Architecture:

[Driver App] → [Location Service] → [Driver Location Store (Redis GEO)]
[Rider App] → [Booking Service] → [Matching Engine]
                                         ↓
                               Query nearby drivers (Redis GEORADIUS)
                               Find best match (cost function: distance + rating + car type)
                                         ↓
                               [Trip Service] → [Database (PostgreSQL)]
                               [Pricing Service] (surge pricing: demand/supply ratio)
                               [ETA Service] (map API + traffic data)

Real-time location updates:

# Driver updates location every 5 seconds
import redis
r = redis.Redis()

def update_driver_location(driver_id, lat, lng):
    r.geoadd("drivers:available", [lng, lat, f"driver:{driver_id}"])
    r.setex(f"driver:last_seen:{driver_id}", 30, f"{lat},{lng}")

def find_nearby_drivers(rider_lat, rider_lng, radius_km=5):
    return r.georadius("drivers:available", rider_lng, rider_lat,
                        radius_km, unit='km',
                        withdist=True, withcoord=True, count=20, sort='ASC')

Surge pricing algorithm:

def calculate_surge(area_id):
    demand = get_ride_requests(area_id, window_minutes=15)
    supply = get_available_drivers(area_id)
    ratio = demand / max(supply, 1)
    if ratio < 1.2: return 1.0
    elif ratio < 1.5: return 1.5
    elif ratio < 2.0: return 2.0
    else: return min(ratio, 5.0)  # cap at 5x

Q15. Design a Hotel/Flight Booking System (like Airbnb / MakeMyTrip)

Requirements: Search availability; reserve rooms/seats; prevent double-booking; handle concurrent reservations; payment integration

Key challenge: Preventing double-booking under concurrent requests

-- Optimistic locking approach
UPDATE inventory SET available_count = available_count - 1, version = version + 1
WHERE room_id = ? AND check_in = ? AND available_count > 0 AND version = ?;
-- If rows_affected = 0: someone else booked it → retry or fail

-- Two-phase booking:
-- 1. Hold (reduce available_count, status='held', TTL=15min)
-- 2. Confirm (payment success → status='confirmed')
-- 3. Release (payment timeout → restore available_count, delete hold)

Search architecture:

Search requests → Elasticsearch (inverted index on amenities, location, price)
Availability → Redis bitmap or dedicated Availability Service (hot reads)
Booking → PostgreSQL with distributed transactions

Q16. Design a Search System for E-commerce (like Amazon Search)

Requirements: Full-text search, faceted filtering (price range, brand, rating), relevance ranking, < 100ms

Architecture:

[Product Catalog Service] → [Indexing Pipeline] → [Elasticsearch Cluster]
[User Query] → [Query Service]
    → Spell correction (edit distance / n-gram index)
    → Query expansion (synonyms)
    → Elasticsearch query (BM25 + filters)
    → Re-ranking (Learning to Rank with click-through features)
    → Facet computation
    → Cache result (Redis, TTL=5min for popular queries)

Elasticsearch query structure:

{
  "query": {
    "bool": {
      "must": [{"multi_match": {"query": "wireless headphones",
                "fields": ["title^3", "description", "brand^2"]}}],
      "filter": [
        {"range": {"price": {"gte": 1000, "lte": 5000}}},
        {"term": {"in_stock": true}},
        {"term": {"category": "electronics"}}
      ]
    }
  },
  "aggs": {
    "brands": {"terms": {"field": "brand.keyword", "size": 20}},
    "price_ranges": {"range": {"field": "price",
                     "ranges": [{"to": 1000}, {"from": 1000, "to": 5000}, {"from": 5000}]}}
  }
}

Q17. Design a Real-time Analytics System

Requirements: Track page views, clicks, conversions in real-time; aggregate metrics (1-min, 1-hour, 1-day windows); query from dashboard

Lambda architecture:

Events → [Kafka] → [Stream Processor (Flink/Spark Streaming)]
                         ↓ real-time aggregates (1-min windows)
                   [Time-series DB (InfluxDB / ClickHouse)]
                         ↓
                   [Dashboard (Grafana)]

               → [Batch Processor (Spark, nightly)]
                         ↓ historical aggregates
                   [Data Warehouse (BigQuery / Snowflake)]

Kappa architecture (simpler, 2026 preferred):

Events → [Kafka] → [Flink] → [ClickHouse]
All queries (real-time + historical) → ClickHouse

Why ClickHouse for analytics:

Columnar storage: only read columns you need
Vectorized execution: SIMD, process 1024 values per CPU instruction
100B row queries in seconds
Native Kafka integration

HARD, Expert-Level Designs (Questions 18-40)

Staff engineer and principal engineer territory. These designs involve distributed consensus, multi-region architectures, and AI-native systems. If you ace these, you're competing for the highest compensation bands in the industry.

Q18. Design Google Drive (File Storage + Sync)

Requirements: Upload/download files up to 100GB; sync across devices; folder structure; sharing; versioning

Key challenge: Large file uploads

# Chunked upload protocol
CHUNK_SIZE = 5 * 1024 * 1024  # 5MB

def upload_file(file_path, file_id):
    total_size = os.path.getsize(file_path)
    with open(file_path, 'rb') as f:
        for chunk_num, offset in enumerate(range(0, total_size, CHUNK_SIZE)):
            chunk = f.read(CHUNK_SIZE)
            chunk_hash = hashlib.sha256(chunk).hexdigest()
            # Check if chunk already exists (deduplication)
            if not server.chunk_exists(chunk_hash):
                server.upload_chunk(file_id, chunk_num, chunk)
            else:
                server.reference_chunk(file_id, chunk_num, chunk_hash)

Architecture:

[Client] → [Upload Service] → [Chunk Store (S3)]
                ↓ metadata
         [File Metadata Service]
                ↓
         [PostgreSQL: file/folder tree, chunk references]
                ↓ sync events
         [Notification Service (WebSocket / long poll)]
         → Push delta to all devices of same user

File metadata schema:

CREATE TABLE files (
    file_id        UUID PRIMARY KEY,
    parent_id      UUID REFERENCES files(file_id),
    owner_id       BIGINT,
    name           VARCHAR(500),
    is_folder      BOOLEAN,
    size_bytes     BIGINT,
    content_hash   CHAR(64),
    version        INT DEFAULT 1,
    created_at     TIMESTAMP,
    modified_at    TIMESTAMP
);

CREATE TABLE file_chunks (
    file_id     UUID,
    chunk_num   INT,
    chunk_hash  CHAR(64),   -- SHA256, enables deduplication
    chunk_size  INT,
    PRIMARY KEY (file_id, chunk_num)
);

CREATE TABLE file_shares (
    file_id     UUID,
    shared_with BIGINT,     -- user_id or NULL for public
    permission  ENUM('view','edit'),
    share_link  VARCHAR(100) UNIQUE
);

Sync conflict resolution:

Last-write-wins (simple, used by Dropbox)
Operational Transform (Google Docs, complex but preserves all edits)
CRDT (Conflict-free Replicated Data Types), emerging standard in 2026

Q19. Design a Twitter/X Feed

Requirements: Post tweets (280 chars); follow users; home timeline (tweets from followed users, newest first); fan-out for celebrities; 500M daily tweets

The fan-out problem:

Approach 1 — Push (Fan-out on write):
  When A posts → immediately write to timelines of all A's followers
  Pros: Timeline read is O(1) from pre-built Redis list
  Cons: Celebrity with 100M followers → 100M writes per tweet

Approach 2 — Pull (Fan-out on read):
  When B requests timeline → merge tweets from all B's followings
  Pros: No write amplification
  Cons: If B follows 1000 people → 1000 DB reads merged on every timeline load

Approach 3 — Hybrid (Twitter's actual approach):
  Regular users (< 10K followers): Push model
  Celebrities (> 10K followers): Pull model
  Timeline = pre-built feed + on-read merge of celebrity tweets

Architecture:

[Tweet POST] → [Tweet Service]
    → [Tweet DB (MySQL sharded by tweet_id)]
    → [Fanout Service] → For each follower: LPUSH timeline:{user_id}
                      → Ignore followers > 10K (celebrity)

[Timeline GET] → [Timeline Service]
    → Read pre-built Redis list (most followers)
    → Merge with recent celebrity tweets (from PostgreSQL, sorted by time)
    → Return top 200

Timeline storage:

# Redis list: timeline per user, stores tweet IDs
# LPUSH (left push) on new tweet, LTRIM to keep 800 most recent
def fan_out(tweet_id, author_id, follower_ids, celebrities):
    pipe = redis.pipeline()
    for fid in follower_ids:
        pipe.lpush(f"timeline:{fid}", tweet_id)
        pipe.ltrim(f"timeline:{fid}", 0, 799)  # keep last 800
    pipe.execute()

def get_timeline(user_id, page=0, limit=20):
    tweet_ids = redis.lrange(f"timeline:{user_id}", page*limit, (page+1)*limit)
    celebrity_tweets = get_celebrity_tweets(user_id, limit=50)  # from DB
    return merge_by_time(fetch_tweets(tweet_ids), celebrity_tweets)[:limit]

Q20. Design a Distributed Cache (like Memcached Cluster)

Concepts:

Consistent hashing: map cache keys to nodes
  - Hash ring with 360 virtual nodes
  - Each physical node gets 100-200 virtual positions
  - Adding/removing a node moves only K/N keys on average

Cache eviction: LRU using doubly-linked list + hash map
  - O(1) access (hash map) + O(1) eviction (doubly-linked list)

Cache aside (most common pattern):
  1. Read: check cache → miss → read DB → write to cache → return
  2. Write: write to DB → delete/update cache (invalidation)

Write-through: write to cache + DB simultaneously
Write-back: write to cache only, flush to DB asynchronously (risk of loss)

Cache coherence in distributed systems:

Use TTL as a safety net
Cache-aside + invalidation on write is the safest
Cache stampede prevention: probabilistic early expiration or distributed lock (SETNX) on cache miss

Q21. Design an Ad Targeting System

Requirements: Match user to relevant ads in < 100ms; support targeting by demographics, interests, location, retargeting

Architecture:

[Ad Inventory] → [Indexing Service] → [Ads Index (Elasticsearch)]
                 (targeting criteria indexed)

[Ad Request: user_id, page_context, device_type]
        ↓
[User Profile Service] → fetch user segments (age, interests, lookalike segments)
        ↓
[Targeting Engine] → Elasticsearch query with user segments + page context
        ↓
[Auction Engine] → Second-price auction among matching ads
  bid = advertiser_bid * predicted_CTR * predicted_CVR  (eCPM)
        ↓
[Served Ad] + [Impression logging → Kafka → ClickHouse]

ML for CTR prediction:

# DeepFM / DCN (Deep Cross Network) — industry standard for ad CTR
# Features: user_id, ad_id, page_context, time_of_day, device, ...
# Embedding all sparse features, DNN for high-order interactions

Q22. Design a Distributed Transaction System (like SAGA Pattern)

Real scenario: E-commerce order: Inventory reservation + Payment processing + Shipping

Problem: Traditional 2PC (two-phase commit) doesn't work at microservice scale, locks are held too long.

SAGA Pattern:

Choreography-based SAGA:
OrderService.create → event "OrderCreated"
  → InventoryService.reserve → event "InventoryReserved" or "ReservationFailed"
  → PaymentService.charge → event "PaymentSucceeded" or "PaymentFailed"
  → ShippingService.schedule → event "ShippingScheduled"

On failure → compensating transactions:
  PaymentFailed → InventoryService.release (compensation)
  ShippingFailed → PaymentService.refund + InventoryService.release

Outbox pattern (ensure event delivery):

-- In same transaction as business logic:
BEGIN;
INSERT INTO orders(id, ...) VALUES (...);
INSERT INTO outbox(event_type, payload) VALUES ('OrderCreated', '{"order_id":...}');
COMMIT;
-- Separate process: poll outbox → publish to Kafka → delete from outbox

Q23. Design a Video Recommendation System (like YouTube Recommendations)

Requirements: Personalized recommendations; real-time feedback incorporation; cold-start for new users

Two-Tower Neural Network:

User Tower:                          Item Tower:
  user_id embedding                   video_id embedding
  + watch history                     + video features
  + search history                    + engagement signals
  + demographics                      + freshness
        ↓                                   ↓
  User embedding (256-d)          Video embedding (256-d)
                   ↓
        Cosine similarity → top-k candidates

Full pipeline:

Offline:
  Train Two-Tower model on (user, video, watch_fraction) triples
  Generate video embeddings → store in Faiss/ScaNN index

Online:
  1. Generate user embedding in real-time (last 50 watched videos)
  2. ANN search in ScaNN index → 1000 candidates
  3. Ranking model: DNN scoring each candidate
  4. Post-processing: diversity, freshness boost, de-duplicate watched
  5. Return top 20

Q24. Design a Log Aggregation System (like Splunk / ELK Stack)

Architecture:

[Application Servers] → [Fluentd/Filebeat (log shipping agents)]
                              ↓
                        [Kafka (buffer + stream)]
                              ↓
                     [Logstash (parse, transform, enrich)]
                              ↓
                     [Elasticsearch (storage + search)]
                              ↓
                        [Kibana (dashboards + alerts)]

Schema design:

{
  "@timestamp": "2026-03-30T10:00:00Z",
  "service": "payment-service",
  "level": "ERROR",
  "trace_id": "abc123",
  "span_id": "def456",
  "user_id": 12345,
  "message": "Payment declined: insufficient funds",
  "duration_ms": 142,
  "host": "payment-svc-pod-7",
  "environment": "production"
}

Scale: At 1M events/sec, Elasticsearch needs 20+ nodes, index-per-day rotation, hot-warm-cold architecture.

Q25. Design a Multi-Region Active-Active Database System

Requirements: Global users; low-latency reads and writes from any region; conflict resolution

Options:

Approach	Latency	Consistency	Complexity
Single region	Low locally, high globally	Strong	Simple
Active-passive	Low reads globally via CDN, high write latency	Strong	Medium
Active-active (eventual)	Low everywhere	Eventual	High
Active-active (strong)	Low locally, sync overhead	Strong	Very high

CockroachDB / Spanner approach:

Distributed SQL with Raft consensus
Transactions span regions with configurable locality
Leaseholder (Raft leader) placed near highest-traffic region

Conflict resolution for active-active:

Last-Write-Wins (LWW): simplest, may lose data
Vector clocks: detect conflicts, application resolves
CRDTs: mathematically merge-able data structures (counters, sets)

System Design FAQs, Insider Answers

Q: How long should my answer be in a 45-minute system design interview? A: Here's the exact time split that works: 5 min requirements clarification, 5 min estimation, 10 min high-level design, 15-20 min deep dive, 5 min failure modes. Practice with a timer, most candidates run out of time on the deep dive, which is exactly where senior candidates differentiate themselves.

Q: Do I need to draw diagrams? A: Yes, use a whiteboard or online drawing tool. Even rough boxes-and-arrows diagrams show structured thinking. Practice drawing clean architectures quickly.

Q: How precise should capacity estimates be? A: Ballpark only, within 10x is fine. The goal is to inform design decisions (e.g., "we need a CDN" or "we need to shard the database"). Don't spend more than 5 minutes on math.

Q: What databases should I know? A: PostgreSQL/MySQL (relational), Cassandra (time-series, high write), Redis (cache, sessions, leaderboards), Elasticsearch (search, analytics), MongoDB (flexible schema), Kafka (streaming), ClickHouse (analytics), S3 (object storage), DynamoDB (key-value at scale).

Q: How do I handle the "it depends" nature of system design? A: State your assumptions explicitly and proceed. "I'll assume read-heavy traffic (10:1 read/write ratio) because..." Interviewers respect opinionated tradeoff reasoning over vague hedging.

Q: What are the most important concepts to know in 2026? A: Distributed transactions (SAGA), event-driven architecture (Kafka), cache consistency patterns, ANN search for AI systems, streaming analytics, observability (tracing with OpenTelemetry).

Q: How do I practice system design? A: The proven formula: Sketch 1 design every day for 30 days (yes, every day). Study real engineering blogs (Uber, Netflix, Discord, Figma, these are gold mines). Do mock interviews on Pramp or with a peer (you learn 3x faster when you have to explain out loud). And read "Designing Data-Intensive Applications" by Kleppmann, it's the single best book for system design interviews, bar none.

Q: What's new in system design interviews in 2026? A: AI-augmented systems are the biggest shift in a decade. Common new questions: design a RAG system, design an LLM inference cluster, design an AI feature store. You must know vector databases, embedding pipelines, model serving (vLLM, Triton), and inference scaling. If you studied system design in 2023, your prep is already outdated, update it with these AI-native patterns.

Level up your full-stack interview prep:

AI/ML Interview Questions 2026, ML fundamentals for system design
Data Engineering Interview Questions 2026, Data pipelines and streaming
AWS Interview Questions 2026, The cloud services behind your designs
DevOps Interview Questions 2026, Deploy and scale your systems
Generative AI Interview Questions 2026, Design AI-native systems

For a related deep-dive, see Microsoft Interview Pattern Bank 2026: LRU Cache, OneDrive & AA Round.

Frequently Asked Questions

What is the typical salary range for candidates selected through system design interview preparation (2026)?

In India, system design-focused roles commonly map to mid-to-senior engineering bands, with offers often ranging from ~₹15 LPA to ₹40+ LPA depending on company tier and prior experience. For top FAANG-like companies, compensation can be significantly higher, but the exact number varies by location, leveling, and negotiation. Your preparation should target not just “passing,” but demonstrating strong trade-offs, scalability, and API clarity.

What eligibility is required to attempt these system design interview questions for 2026 placements?

Most candidates are expected to have a basic grasp of data structures, networking fundamentals (HTTP, TCP), and at least one backend language. Typically, students in their final year or early experience (0–3 years) should focus on fundamentals and simpler designs first, while 3+ years candidates can go deeper into distributed systems, consistency, and capacity planning. If you can explain CRUD flows, caching, and database indexing clearly, you’re already on the right track.

How difficult are system design interviews compared to coding rounds?

System design is usually more difficult than a single coding problem because it requires structured thinking, trade-off analysis, and the ability to communicate decisions clearly. Instead of one correct answer, interviewers evaluate your reasoning: scalability, reliability, latency, throughput, and failure handling. Candidates often struggle with capacity estimation and “what to do when things go wrong,” so practice those areas early.

What preparation tips work best for cracking the top 40 system design questions (FAANG-style rotation)?

Prepare using a repeatable template: requirements → APIs → data model → high-level architecture → detailed components (DB/cache/queues) → scaling strategies → failure modes → monitoring. For each question (e.g., YouTube, WhatsApp, URL shortener, rate limiter, notification system), write down assumptions and do quick capacity estimates (QPS, storage growth, bandwidth). Finally, practice explaining your design in 15–20 minutes, then refine based on interviewer feedback.

What are the typical interview rounds for system design-focused hiring in 2026?

A common flow is: recruiter screen → coding round(s) → system design round(s) → behavioral/leadership round. Some companies also include a “technical deep dive” where you discuss specific components like caching strategy, database sharding, or message delivery guarantees. If system design is emphasized, you may see multiple design questions or one complex design with follow-up probes.

What common topics are repeatedly asked in system design interviews like YouTube, WhatsApp, and URL shortener?

Expect recurring themes such as designing REST/gRPC APIs, choosing between SQL vs NoSQL, caching (LRU/TTL), CDN usage, pagination, and search indexing. For distributed systems, interviewers often ask about consistency models, replication, partitioning/sharding, idempotency, and handling retries/timeouts. Notification and messaging systems frequently test queueing, fan-out strategies, and delivery guarantees (at-most-once vs at-least-once).

How do I apply or use these system design interview question resources for placements?

Use the “Top 40 Questions with Answers” list as a structured syllabus: pick 1–2 designs per week, attempt a fresh design from scratch, and then compare with the provided step-by-step solutions. Track your weak areas (e.g., rate limiting math, database schema, or event-driven architecture) and revisit them with targeted practice. If your site supports it, save/bookmark questions and maintain a revision schedule aligned with your interview dates.

What selection rate can I expect after practicing these system design questions, does it guarantee results?

There’s no universal selection rate because outcomes depend on company level, overall profile, and how well you perform across rounds, not only system design. However, consistent practice of FAANG-style designs typically improves your interview readiness significantly, especially your ability to handle follow-up questions and trade-offs. Treat the “40 exact designs” approach as a high-signal practice method: it boosts coverage, reduces surprise, and strengthens communication, which are key drivers of selection.

System Design Interview Questions 2026, Top 40 Questions with Answers

The Proven Framework to Ace System Design Interviews

EASY, Foundational Designs (Questions 1-10)

Q1. Design a URL Shortener (like bit.ly)

Q2. Design a Rate Limiter

Q3. Design a Key-Value Store (like Redis)

Q4. Design a Pastebin / Document Sharing Service

Q5. Design a Leaderboard System

Q6. Design a Distributed Message Queue (like Kafka)

Q7. Design a Web Crawler

Q8. Design a Type-Ahead / Autocomplete System

Q9. Design a Distributed ID Generator (like Twitter Snowflake)

Q10. Design a Search Engine

MEDIUM, Complex Systems (Questions 11-27)

Q11. Design YouTube

Q12. Design WhatsApp

Q13. Design a Notification System

Q14. Design a Ride-Sharing Service (like Uber)

Q15. Design a Hotel/Flight Booking System (like Airbnb / MakeMyTrip)

Q16. Design a Search System for E-commerce (like Amazon Search)

Q17. Design a Real-time Analytics System

HARD, Expert-Level Designs (Questions 18-40)

Q18. Design Google Drive (File Storage + Sync)

Q19. Design a Twitter/X Feed

Q20. Design a Distributed Cache (like Memcached Cluster)

Q21. Design an Ad Targeting System

Q22. Design a Distributed Transaction System (like SAGA Pattern)

Q23. Design a Video Recommendation System (like YouTube Recommendations)

Q24. Design a Log Aggregation System (like Splunk / ELK Stack)

Q25. Design a Multi-Region Active-Active Database System

You May Also Like

System Design FAQs, Insider Answers

Frequently Asked Questions

What is the typical salary range for candidates selected through system design interview preparation (2026)?

What eligibility is required to attempt these system design interview questions for 2026 placements?

How difficult are system design interviews compared to coding rounds?

What preparation tips work best for cracking the top 40 system design questions (FAANG-style rotation)?

What are the typical interview rounds for system design-focused hiring in 2026?

What common topics are repeatedly asked in system design interviews like YouTube, WhatsApp, and URL shortener?

How do I apply or use these system design interview question resources for placements?

What selection rate can I expect after practicing these system design questions, does it guarantee results?

More resources in Interview Questions

Sat this this year? Share your story, earn ₹500.

Take a free timed mock test

Related Articles

ABB Interview Questions 2026 - Round-by-Round Guide

Accenture Interview Questions 2026

Adobe Interview Questions 2026

AMD Interview Questions 2026 - Round-by-Round Guide

Atlassian Interview Questions 2026 - Round-by-Round Guide

More from PapersAdda

Share this guide: