Skip to content

HLD fundamentals

DNS (Domain Name System) is the internet’s phone book. It translates human-friendly domain names into IP addresses, allowing clients to find servers on the internet.

PlantUML Diagram

An API (Application Programming Interface) is a contract or set of rules that defines how one piece of software can request services from another.

Treats everything as resources (e.g., userProfile), with each resource having a unique URL. REST is stateless—each request contains all information the server needs to fulfill it, enhancing scalability. Commonly used for public web APIs.

PlantUML Diagram

Enables calling functions on a remote server as if they were local. Focuses on actions and operations rather than resources. Typically used for internal service communication where tight coupling is acceptable or desirable.

How it works: Client uses generated code (stub) that serializes the function call and parameters into binary format (Protocol Buffers), sends it over HTTP/2, and the server deserializes it to execute the function.

PlantUML Diagram

Gives clients fine-grained control over data retrieval. Instead of fixed REST endpoints, clients specify exactly what data they need in a single request. Uses a strong type system, making it ideal for mobile apps where minimizing data transfer is crucial.

PlantUML Diagram
PlantUML Diagram

Databases store the data that APIs use and serve. There are two main categories: SQL (relational) and NoSQL (non-relational).

SQL databases are like spreadsheets on steroids. They store data in tables with rows and columns using a predefined schema. They guarantee ACID properties:

  • Atomicity: Ensures all transactions succeed or none do
  • Consistency: Validates all data before and after transactions by enforcing constraints
  • Isolation: Prevents concurrent transactions from interfering with each other
  • Durability: Guarantees committed transactions persist even after crashes

Examples: MySQL, PostgreSQL, Oracle DB, MS SQL Server (MSSQL)

NoSQL databases offer flexibility beyond rigid table structures. They handle semi-structured or unstructured data efficiently.

Document Databases (MongoDB, CouchDB)

  • Store data in JSON-like documents
  • Flexible schema for evolving data models

Key-Value Stores (Redis, DynamoDB, Memcached)

  • Simple key-value pair storage
  • Ideal for caching, session management, and high-speed operations

Column-Family Stores (Cassandra, HBase)

  • Store data in columns rather than rows
  • Optimized for massive write/read workloads
  • Use cases: activity feeds, time-series data, big data analytics

Graph Databases (Neo4j, ArangoDB)

  • Store data as nodes (entities) and edges (relationships)
  • Perfect when relationships are as important as the data itself
  • Use cases: social networks, recommendation systems, fraud detection
PlantUML Diagram

Key-value databases like Redis and Memcached operate primarily in main memory (RAM), enabling extremely fast read and write operations.

Primary Use Cases:

  • Caching: Keep frequently accessed data in memory
  • Session Management: Store user session data for quick retrieval
  • Real-time Analytics: Process high-velocity data streams
PlantUML Diagram

As applications grow in popularity, they need to scale to handle increased load. There are two fundamental approaches:

Upgrade existing hardware: add more CPU, RAM, or faster storage.

Pros:

  • Simpler to implement
  • No application changes needed

Cons:

  • Physical hardware limits
  • Single point of failure
  • Limited high availability

Add more machines to distribute the load across multiple servers.

Pros:

  • Nearly unlimited scaling potential
  • Better fault tolerance
  • High availability (if one server fails, others continue)

Cons:

  • Increased complexity
  • Data consistency challenges
  • Requires load balancing and coordination
PlantUML Diagram
PlantUML Diagram

Load balancers use various algorithms to distribute traffic:

Round Robin

  • Distributes requests in circular sequence: Server 1 → Server 2 → Server 3 → Server 1…
  • Simple and fair distribution

Least Connections

  • Routes to the server with fewest active connections
  • Keeps all servers equally busy

IP Hash

  • Uses client IP address to consistently route to the same server
  • Enables session stickiness for stateful applications

Self-Managed:

  • HAProxy
  • NGINX

Cloud-Managed:

  • AWS Elastic Load Balancing (ELB)
  • Google Cloud Load Balancing
  • Azure Load Balancer
PlantUML Diagram

Caching creates a high-speed storage layer for frequently accessed data, reducing latency and improving performance—like keeping your most-used tools within arm’s reach.

Caching occurs at multiple layers:

Browser Cache

  • Stores static assets (images, CSS, JavaScript)
  • Reduces load times on repeat visits

DNS Cache

  • Caches IP address mappings for domain names
  • Speeds up DNS resolution

Application Cache

  • In-memory storage for frequently accessed data
  • Reduces database queries

Database Cache

  • Caches query results
  • Speeds up repeated queries

CDN (Content Delivery Network)

  • Caches static assets at edge locations globally
  • Delivers content from geographically closest servers

Application checks cache first. On miss, fetch from database and populate cache.

Pros: Simple, only caches what’s needed Cons: Cache miss penalty, potential stale data

Writes go to both cache and database simultaneously.

Pros: Strong consistency between cache and database Cons: Slower writes (waiting for both operations)

Writes go to cache first, then asynchronously to database.

Pros: Fast writes, reduced database load Cons: Risk of data loss if cache fails before database sync

Writes bypass cache, going directly to database. Cache populated only on reads.

Pros: Prevents cache pollution with infrequently accessed data Cons: Cache misses on recently written data

PlantUML Diagram
PlantUML Diagram
PlantUML Diagram
PlantUML Diagram

When cache is full, eviction policies determine what gets removed:

LRU (Least Recently Used)

  • Removes least recently accessed items
  • Good general-purpose strategy
  • Assumes recent access predicts future access

FIFO (First In First Out)

  • Removes oldest items regardless of access frequency
  • Simple but not always optimal
  • Best for sequential access patterns (video streaming, live sports)

LFU (Least Frequently Used)

  • Removes items with lowest access count
  • Keeps “hottest” data in cache
  • Good for workloads with clear access patterns
PlantUML Diagram
PlantUML Diagram
PlantUML Diagram

A geographically distributed network of servers that cache static content (images, videos, web assets) at edge locations worldwide.

Benefits:

  • Reduced latency by serving from geographically closest server
  • Improved load times
  • Reduced load on origin server
  • Better global user experience
PlantUML Diagram

When databases grow to terabytes, a single database can’t handle it all. Storage limits are reached and query performance degrades.

Solution: Divide data into smaller independent chunks (shards) distributed across multiple servers.

Horizontal Partitioning (Sharding)

  • Splits data by rows
  • Example: Users A-M in Shard 1, N-Z in Shard 2
  • Uses partition key (user ID, username)
  • Most common scaling approach

Vertical Partitioning

  • Splits data by columns
  • Example: User profiles in Shard 1, user activity logs in Shard 2
  • Separates frequently vs infrequently accessed data
  • Optimizes for different access patterns
PlantUML Diagram

Sharding handles large datasets, but replication ensures availability and reliability by maintaining multiple identical copies of data. If one server fails, others take over without data loss or downtime.

Benefits:

  • Improved system reliability and availability
  • Better read performance (distribute reads across replicas)
  • Fault tolerance

Architecture:

  • One primary (master) handles all writes
  • Multiple replicas (slaves) handle reads
  • Changes replicated asynchronously to replicas

Pros: Simple, writes centralized Cons: Write bottleneck at primary

Architecture:

  • Writes accepted by any replica
  • Changes replicated to other replicas

Pros: Higher write availability Cons: Complex conflict resolution for concurrent writes

Synchronous

  • Primary waits for all replicas to confirm
  • Very safe, but slower

Asynchronous

  • Primary doesn’t wait for confirmation
  • Fast, but small window of potential inconsistency

Semi-Synchronous

  • Wait for at least one replica
  • Balances safety and performance
PlantUML Diagram
PlantUML Diagram

A reverse proxy acts as an intermediary gateway between clients and backend servers.

  • Load Balancing: Distribute requests across servers
  • SSL/TLS Termination: Handle encryption/decryption TLS-Termination
  • Caching: Cache static content
  • Security: Filter malicious requests, hide infrastructure
  • Compression: Compress responses before sending to clients
  • NGINX
  • HAProxy
  • Apache HTTP Server
  • Envoy
PlantUML Diagram

Enables asynchronous communication between services in microservices architectures when immediate responses aren’t required.

Example: Sending order confirmation emails

Services publish messages (e.g., order 123 confirmed) to queues/topics without knowing which service consumes them. This decouples services, allowing independent scaling.

Resilience

  • If consumer service is down, producer continues publishing
  • Messages processed when consumer recovers

Asynchronous Processing

  • Producer doesn’t wait for consumer
  • Handles different processing speeds

Scalability

  • Add more consumers to handle load
  • Services scale independently
  • Apache Kafka: High-throughput, distributed streaming
  • RabbitMQ: Feature-rich message broker
  • AWS SQS: Managed cloud queue service
  • Redis Streams: Lightweight pub/sub
PlantUML Diagram
PlantUML Diagram

An architectural style that breaks large applications into smaller, independent services, each focused on a specific business capability.

Examples: Inventory service, user management service, payment service

Synchronous: REST APIs, gRPC Asynchronous: Message queues (Kafka, RabbitMQ)

Modularity

  • Develop, deploy, and scale services independently
  • Changes to one service don’t require redeploying others

Fault Isolation

  • Bugs in one service don’t crash the entire application
  • Graceful degradation of functionality

Technology Diversity

  • Each service can use different tech stacks
  • Choose best tool for each job
PlantUML Diagram
Section titled “Related Building Blocks and Operational Concepts”

Systems for sending alerts to users or other systems.

Types: Push notifications, emails, SMS Implementation: Often built using message queues internally Tools: Firebase Cloud Messaging, AWS SNS, Twilio

Specialized search engines for querying large text datasets.

Purpose: Fast, flexible search through product descriptions, articles, logs Advantage: Much faster than SQL LIKE queries Tools: Elasticsearch, Apache Solr, Algolia

Critical for managing state and agreement in distributed systems.

Tools: Apache ZooKeeper, etcd, Consul

Service Discovery

  • Track which services are running and where

Distributed Locking

  • Ensure only one service performs critical operations
  • Prevent race conditions

Leader Election

  • Determine which server is primary/coordinator

Configuration Management

  • Centralized, reliable source of truth
  • Distribute config changes to all services

Heartbeats

  • Periodic health checks
  • Detect service failures
  • Trigger alerts or failover

Checksums

  • Digital fingerprints for data integrity
  • Verify data hasn’t been corrupted during transmission
  • Compare checksums before and after transfer

States that distributed systems can guarantee only two of three properties when network partitions occur:

C — Consistency

  • Every read gets the most recent write
  • All nodes see the same data simultaneously

A — Availability

  • Every request receives a response (success or failure)
  • System remains operational even with stale data

P — Partition Tolerance

  • System continues operating despite network failures
  • Some servers can’t communicate, but system functions

CP Systems (Consistency + Partition Tolerance)

Section titled “CP Systems (Consistency + Partition Tolerance)”

Prioritize: Strong consistency

Trade-off: May become unavailable during partitions

Examples: Traditional RDBMS, MongoDB (with strong consistency settings), HBase

Use case: Financial transactions, inventory management

AP Systems (Availability + Partition Tolerance)

Section titled “AP Systems (Availability + Partition Tolerance)”

Prioritize: High availability

Trade-off: May serve stale data during partitions

Examples: Cassandra, DynamoDB, Couchbase

Concept: Eventual Consistency — all replicas converge to the same state once updates stop

Use case: Social media feeds, product catalogs, user profiles

PlantUML Diagram

An extension of the CAP theorem that addresses what happens during normal operation (no partition). CAP only explains behavior during partitions, but PACELC provides a more complete picture.

PACELC stands for:

  • P (Partition): If there is a partition…
  • A (Availability) vs C (Consistency): choose between Availability or Consistency
  • E (Else): Otherwise (no partition)…
  • L (Latency) vs C (Consistency): choose between Latency or Consistency

During Partition (PA/C):

  • Same as CAP theorem
  • Choose between Availability (serve possibly stale data) or Consistency (refuse requests)

During Normal Operation (EL/C):

  • Choose between Lower Latency (faster responses with relaxed consistency) or stronger Consistency (slower responses, wait for all replicas)

PA/EL Systems (Availability + Latency)

  • Partition: Prioritize Availability (AP)
  • Normal: Prioritize Latency (fast responses)
  • Trade-off: Eventual consistency
  • Examples: Cassandra, DynamoDB, Riak
  • Use case: Social media, content delivery, shopping carts

PA/EC Systems (Availability + Consistency)

  • Partition: Prioritize Availability (AP)
  • Normal: Prioritize Consistency (slower but consistent)
  • Examples: MongoDB (with eventual consistency mode)
  • Use case: Less common, but useful for systems that need consistency when stable

PC/EL Systems (Consistency + Latency)

  • Partition: Prioritize Consistency (CP)
  • Normal: Prioritize Latency (caching, read replicas)
  • Examples: Memcached, some caching systems
  • Use case: Systems that can tolerate unavailability during partitions but need speed normally

PC/EC Systems (Consistency + Consistency)

  • Partition: Prioritize Consistency (CP)
  • Normal: Prioritize Consistency (strong guarantees)
  • Trade-off: Higher latency
  • Examples: Traditional RDBMS (MySQL, PostgreSQL), HBase, VoltDB
  • Use case: Banking, financial transactions, inventory systems
PlantUML Diagram

Cassandra (PA/EL):

  • Partition: Remains available, accepts writes
  • Normal: Fast reads/writes with tunable consistency (eventual by default)
  • Result: High performance, eventually consistent

PostgreSQL (PC/EC):

  • Partition: May become unavailable to maintain consistency
  • Normal: Synchronous replication for strong consistency
  • Result: Strong guarantees, higher latency

DynamoDB (PA/EL by default):

  • Partition: Stays available
  • Normal: Low latency reads (eventually consistent reads by default)
  • Optional: Can request strongly consistent reads (PA/EC behavior) with higher latency

Automatic switching to redundant systems when primary components fail.

Examples:

  • Switch to standby database replica
  • Redirect traffic to healthy servers via load balancer
  • Promote backup to primary role

Goal: Minimize downtime and maintain service availability

Types:

  • Active-Passive: Standby idles until needed. There will be sets of servers which are kept in standby mode until needed.
  • Active-Active: Multiple systems handle load simultaneously. All the servers will handle all requests, if one goes down the load balancer will redirect the requests to other servers. complex consistency strategies are needed here.

A fault tolerance mechanism that prevents applications from repeatedly attempting operations likely to fail, protecting systems from cascading failures.

Analogy: Like an electrical circuit breaker in your home—when something goes wrong, it “trips” to prevent further damage.

The Circuit Breaker operates in three states:

1. Closed (Normal Operation)

  • All requests flow through normally
  • System monitors for failures (error rates, timeouts, response times)
  • Tracks failure metrics against threshold

2. Open (Failure Mode)

  • Threshold exceeded (e.g., 50% failure rate or 5 consecutive failures)
  • Circuit “trips open”
  • Requests fail fast without calling the downstream service
  • Prevents resource exhaustion (no blocked threads or connections)
  • Waits for timeout period (e.g., 30 seconds) before testing recovery

3. Half-Open (Testing Recovery)

  • After timeout expires, circuit enters half-open state
  • Allows limited test requests through (e.g., 3 requests)
  • If successful → return to Closed state
  • If still failing → return to Open state
PlantUML Diagram

Scenario: E-commerce application calling Payment Service

Without Circuit Breaker:

User → Service A → Payment Service (slow/failing)
Threads blocked
Resource exhaustion
Service A crashes too

With Circuit Breaker:

User → Service A → Circuit Breaker → Payment Service
After 5 failures
Circuit OPENS
Service A stays healthy
Returns fallback response

Prevents Cascading Failures

  • Isolates failing services from the rest of the system
  • Stops the domino effect across microservices

Fail Fast

  • Immediate error response instead of waiting for timeouts
  • Better user experience (quick feedback vs hanging requests)

Resource Protection

  • Frees up threads, connections, and memory
  • Prevents thread pool exhaustion

Automatic Recovery Testing

  • Periodically checks if downstream service recovered
  • No manual intervention needed

Graceful Degradation

  • System continues operating with reduced functionality
  • Can return cached data or default responses

Failure Threshold: How many failures trigger opening?

  • Example: 5 consecutive failures or 50% error rate within a time window

Timeout Period: How long to wait before testing recovery?

  • Example: 30 seconds, 1 minute
  • Gives downstream service time to recover

Success Threshold: How many successes in half-open to close?

  • Example: 3 consecutive successful requests

Request Volume Threshold: Minimum requests before calculating error rate

  • Example: Need at least 10 requests in window before opening circuit

External API Calls

  • Third-party payment gateways
  • Shipping APIs
  • Authentication services
  • Weather APIs

Database Connections

  • Protecting against database outages
  • Preventing connection pool exhaustion

Microservice Communication

  • Service-to-service calls
  • Preventing cascading failures across distributed systems

When circuit is open, implement graceful degradation:

Cached Data: Return last known good value

return circuitBreaker.execute(
() -> paymentService.getBalance(),
fallback: () -> cache.getLastBalance()
);

Default Response: Return sensible default

return DEFAULT_SHIPPING_ESTIMATE; // "3-5 business days"

Queue Request: Store for later processing

messageQueue.enqueue(paymentRequest);
return "Your payment is being processed";

Alternative Service: Route to backup service

if (primaryCircuit.isOpen()) {
return secondaryPaymentService.process();
}

User Message: Clear error explaining temporary unavailability

return "Payment service temporarily unavailable. Please try again in a moment.";

Popular libraries:

Java: Resilience4j, Hystrix (maintenance mode)

.NET: Polly

Node.js: Opossum

Python: pybreaker

paymentservice.py
breaker = pybreaker.CircuitBreaker(
fail_max=5,
reset_timeout=30
)
@breaker
def call_payment_service():
return payment_service.process()

A technique for distributing data across servers that minimizes redistribution when servers are added or removed.

Problem with Simple Hashing:

  • Adding/removing one server → rehash all keys → massive data movement

Consistent Hashing Solution:

  • Only a small fraction of keys need rebalancing
  • Smoother scaling operations

Use Cases:

  • Distributed caches (Memcached, Redis clusters)
  • Distributed databases (Cassandra, DynamoDB)
  • Load balancing
PlantUML Diagram

Ensures performing an operation multiple times has the same effect as performing it once.

Why It Matters: Network failures may cause automatic request retries. Without idempotency, duplicate requests could cause unintended effects.

Example: Payment Processing

  • User clicks “Pay”
  • Network hiccup causes retry
  • Without idempotency: double charge ❌
  • With idempotency: same result ✅

Implementation:

  • Use unique identifiers (transaction ID, idempotency key)
  • Server checks if request already processed
  • Return original response for duplicate requests

HTTP Method Idempotency:

  • Naturally Idempotent: GET, PUT, DELETE
  • Needs Design: POST (use idempotency keys)

Controls incoming request volume to prevent abuse and overload.

Benefits:

  • Prevents DDoS attacks
  • Ensures fair usage
  • Protects backend resources
  • Maintains service quality

Implementation Level: API Gateway or Load Balancer

Token Bucket

  • Bucket holds tokens (request capacity)
  • Tokens refill at constant rate
  • Request consumes token; rejected if bucket empty
  • Allows bursts up to bucket capacity
PlantUML Diagram

Leaky Bucket

  • Requests added to queue (bucket)
  • Processed at constant rate (leak)
  • Overflow requests dropped
  • Smooths traffic spikes
PlantUML Diagram

Fixed Window

  • Count requests in fixed time windows (e.g., per minute)
  • Reset counter at window boundary
  • Simple but can allow traffic spikes at boundaries
PlantUML Diagram

Sliding Window

  • Tracks requests over sliding time window
  • More accurate than fixed window
  • Prevents boundary spike issues
PlantUML Diagram

Collects metrics about system health and performance.

Key Metrics:

  • CPU and memory usage
  • Request latency (P50, P95, P99)
  • Error rates and types
  • Throughput (requests/sec)
  • Database connection pool stats

Popular Stack:

  • Prometheus: Metrics collection and storage
  • Grafana: Visualization and dashboards
  • AlertManager: Alert routing and notification

Records events, errors, and system activities for debugging and auditing.

Log Levels: DEBUG, INFO, WARN, ERROR, FATAL

Structured Logging: Use JSON format for machine-readable logs

Popular Stack:

  • ELK Stack: Elasticsearch, Logstash, Kibana
  • EFK Stack: Elasticsearch, Fluentd, Kibana
  • OpenSearch: Open-source alternative to Elasticsearch
PlantUML Diagram

Key principles that guide distributed system design:

Scalability

  • Can the system grow to handle increased load?
  • Achieved through horizontal scaling, load balancing, partitioning

Maintainability

  • How easy to understand, modify, and operate over time?
  • Clear code, documentation, modular architecture

Efficiency

  • Optimal use of resources (CPU, memory, network, storage)
  • Cost-effectiveness at scale

Resilience

  • Gracefully handle failures
  • Design for failures, not against them

Key metrics to evaluate system design:

Availability

  • Percentage of time system is operational
  • Measured in “nines”: 99.9% (“three nines”), 99.99% (“four nines”)
  • 99.9% = ~8.76 hours downtime/year
  • 99.99% = ~52.56 minutes downtime/year

Reliability

  • System consistently performs as expected without failure
  • Mean Time Between Failures (MTBF)

Fault Tolerance

  • System’s ability to continue operating despite component failures

Redundancy

  • Availability of backup components when needed

Throughput

  • Amount of work processed per unit time (requests/sec, transactions/sec)

Latency

  • Time to process a single request
  • Measured as percentiles: P50 (median), P95, P99

Operation Clarity

  • Use clear, intent-revealing names over generic CRUD
  • Example: POST /orders/123/cancel vs DELETE /orders/123

Protocol Selection

  • HTTP/REST: Browser-based, public APIs
  • gRPC: High-performance internal services
  • GraphQL: Flexible client-driven queries

Data Formats

  • JSON: Human-readable, universal support
  • Protocol Buffers: Compact, efficient (gRPC)
  • XML: Legacy systems

Pagination

  • Limit result sets for performance
  • Patterns: offset/limit, cursor-based, page numbers

Filtering and Sorting

  • Allow clients to query specific data
  • Example: /users?status=active&sort=created_at

Idempotency

  • Ensure safe request retries
  • GET, PUT, DELETE are naturally idempotent
  • Add idempotency keys for POST

Versioning

  • Maintain backward compatibility
  • Strategies: URL versioning (/v1/users), header versioning

Security

  • Authentication (OAuth 2.0, JWT)
  • Rate limiting
  • Input validation
  • HTTPS only

Documentation

  • OpenAPI/Swagger specifications
  • Interactive API explorers
  • Code examples in multiple languages

System design involves balancing competing concerns:

CAP Theorem: Choose 2 of 3 (Consistency, Availability, Partition Tolerance)

  • CP Systems: Strong consistency, possible downtime
  • AP Systems: High availability, eventual consistency
SQLNoSQL
Strong consistencyFlexibility
ACID guaranteesHorizontal scalability
Complex joinsSimpler data models
Structured schemaSchema-less
StrategyPerformanceConsistencyComplexityRisk
Cache AsideMediumMediumLowLow
Write ThroughLow (writes)HighMediumLow
Write BackHighLowHighData loss
Write AroundLow (first read)MediumLowLow

Microservices:

  • ✅ Modularity, independent scaling
  • ❌ Operational complexity, distributed debugging

Monolith:

  • ✅ Simple deployment, easier debugging
  • ❌ Tight coupling, harder to scale

More Security → More friction for users Less Security → Better UX but higher risk

Balance: Multi-factor authentication, rate limiting, clear error messages

Higher Performance → More expensive infrastructure Cost Optimization → Potential performance trade-offs

Balance: Caching, efficient algorithms, right-sized resources

The “right” system design depends entirely on your specific context:

  • Requirements: What must the system do?
  • Constraints: Budget, team size, timeline, regulations
  • Priorities: Performance, reliability, cost, time-to-market

Remember: Context is everything. A design that works for a startup with 1,000 users will differ vastly from one serving 100 million users.

Start simple, evolve as needed. Premature optimization is the root of much wasted effort.