HLD fundamentals

DNS

DNS (Domain Name System) is the internet’s phone book. It translates human-friendly domain names into IP addresses, allowing clients to find servers on the internet.

API

An API (Application Programming Interface) is a contract or set of rules that defines how one piece of software can request services from another.

Types

REST (Representational State Transfer)

Treats everything as resources (e.g., userProfile), with each resource having a unique URL. REST is stateless—each request contains all information the server needs to fulfill it, enhancing scalability. Commonly used for public web APIs.

RPC (Remote Procedure Call)

Enables calling functions on a remote server as if they were local. Focuses on actions and operations rather than resources. Typically used for internal service communication where tight coupling is acceptable or desirable.

How it works: Client uses generated code (stub) that serializes the function call and parameters into binary format (Protocol Buffers), sends it over HTTP/2, and the server deserializes it to execute the function.

GraphQL

Gives clients fine-grained control over data retrieval. Instead of fixed REST endpoints, clients specify exactly what data they need in a single request. Uses a strong type system, making it ideal for mobile apps where minimizing data transfer is crucial.

Summary

Database

Databases store the data that APIs use and serve. There are two main categories: SQL (relational) and NoSQL (non-relational).

SQL (Relational DB)

SQL databases are like spreadsheets on steroids. They store data in tables with rows and columns using a predefined schema. They guarantee ACID properties:

Atomicity: Ensures all transactions succeed or none do
Consistency: Validates all data before and after transactions by enforcing constraints
Isolation: Prevents concurrent transactions from interfering with each other
Durability: Guarantees committed transactions persist even after crashes

Examples: MySQL, PostgreSQL, Oracle DB, MS SQL Server (MSSQL)

NoSQL

NoSQL databases offer flexibility beyond rigid table structures. They handle semi-structured or unstructured data efficiently.

Types

Document Databases (MongoDB, CouchDB)

Store data in JSON-like documents
Flexible schema for evolving data models

Key-Value Stores (Redis, DynamoDB, Memcached)

Simple key-value pair storage
Ideal for caching, session management, and high-speed operations

Column-Family Stores (Cassandra, HBase)

Store data in columns rather than rows
Optimized for massive write/read workloads
Use cases: activity feeds, time-series data, big data analytics

Graph Databases (Neo4j, ArangoDB)

Store data as nodes (entities) and edges (relationships)
Perfect when relationships are as important as the data itself
Use cases: social networks, recommendation systems, fraud detection

Summary

Key-Value Stores

Key-value databases like Redis and Memcached operate primarily in main memory (RAM), enabling extremely fast read and write operations.

Primary Use Cases:

Caching: Keep frequently accessed data in memory
Session Management: Store user session data for quick retrieval
Real-time Analytics: Process high-velocity data streams

Scalability

Approaches

As applications grow in popularity, they need to scale to handle increased load. There are two fundamental approaches:

Vertical Scaling (Scale Up)

Upgrade existing hardware: add more CPU, RAM, or faster storage.

Pros:

Simpler to implement
No application changes needed

Cons:

Physical hardware limits
Single point of failure
Limited high availability

Horizontal Scaling (Scale Out)

Add more machines to distribute the load across multiple servers.

Pros:

Nearly unlimited scaling potential
Better fault tolerance
High availability (if one server fails, others continue)

Cons:

Increased complexity
Data consistency challenges
Requires load balancing and coordination

Load Balancers

Algorithms

Load balancers use various algorithms to distribute traffic:

Round Robin

Distributes requests in circular sequence: Server 1 → Server 2 → Server 3 → Server 1…
Simple and fair distribution

Least Connections

Routes to the server with fewest active connections
Keeps all servers equally busy

IP Hash

Uses client IP address to consistently route to the same server
Enables session stickiness for stateful applications

High Availability

Tools

Self-Managed:

HAProxy
NGINX

Cloud-Managed:

AWS Elastic Load Balancing (ELB)
Google Cloud Load Balancing
Azure Load Balancer

Caching

Caching creates a high-speed storage layer for frequently accessed data, reducing latency and improving performance—like keeping your most-used tools within arm’s reach.

Caching Levels

Caching occurs at multiple layers:

Browser Cache

Stores static assets (images, CSS, JavaScript)
Reduces load times on repeat visits

DNS Cache

Caches IP address mappings for domain names
Speeds up DNS resolution

Application Cache

In-memory storage for frequently accessed data
Reduces database queries

Database Cache

Caches query results
Speeds up repeated queries

CDN (Content Delivery Network)

Caches static assets at edge locations globally
Delivers content from geographically closest servers

Caching Strategies

Cache Aside (Lazy Loading)

Application checks cache first. On miss, fetch from database and populate cache.

Pros: Simple, only caches what’s needed Cons: Cache miss penalty, potential stale data

Write Through

Writes go to both cache and database simultaneously.

Pros: Strong consistency between cache and database Cons: Slower writes (waiting for both operations)

Write Back (Write Behind)

Writes go to cache first, then asynchronously to database.

Pros: Fast writes, reduced database load Cons: Risk of data loss if cache fails before database sync

Write Around

Writes bypass cache, going directly to database. Cache populated only on reads.

Pros: Prevents cache pollution with infrequently accessed data Cons: Cache misses on recently written data

Cache Eviction Policies

When cache is full, eviction policies determine what gets removed:

LRU (Least Recently Used)

Removes least recently accessed items
Good general-purpose strategy
Assumes recent access predicts future access

FIFO (First In First Out)

Removes oldest items regardless of access frequency
Simple but not always optimal
Best for sequential access patterns (video streaming, live sports)

LFU (Least Frequently Used)

Removes items with lowest access count
Keeps “hottest” data in cache
Good for workloads with clear access patterns

CDN (Content Delivery Network)

A geographically distributed network of servers that cache static content (images, videos, web assets) at edge locations worldwide.

Benefits:

Reduced latency by serving from geographically closest server
Improved load times
Reduced load on origin server
Better global user experience

Data Partitioning and Sharding

When databases grow to terabytes, a single database can’t handle it all. Storage limits are reached and query performance degrades.

Solution: Divide data into smaller independent chunks (shards) distributed across multiple servers.

Partitioning Strategies

Horizontal Partitioning (Sharding)

Splits data by rows
Example: Users A-M in Shard 1, N-Z in Shard 2
Uses partition key (user ID, username)
Most common scaling approach

Vertical Partitioning

Splits data by columns
Example: User profiles in Shard 1, user activity logs in Shard 2
Separates frequently vs infrequently accessed data
Optimizes for different access patterns

Replication

Sharding handles large datasets, but replication ensures availability and reliability by maintaining multiple identical copies of data. If one server fails, others take over without data loss or downtime.

Benefits:

Improved system reliability and availability
Better read performance (distribute reads across replicas)
Fault tolerance

Replication Strategies

Single Leader (Master-Slave)

Architecture:

One primary (master) handles all writes
Multiple replicas (slaves) handle reads
Changes replicated asynchronously to replicas

Pros: Simple, writes centralized Cons: Write bottleneck at primary

Multi-Leader

Architecture:

Writes accepted by any replica
Changes replicated to other replicas

Pros: Higher write availability Cons: Complex conflict resolution for concurrent writes

Synchronization Modes

Synchronous

Primary waits for all replicas to confirm
Very safe, but slower

Asynchronous

Primary doesn’t wait for confirmation
Fast, but small window of potential inconsistency

Semi-Synchronous

Wait for at least one replica
Balances safety and performance

Reverse Proxy

A reverse proxy acts as an intermediary gateway between clients and backend servers.

Key Functions

Load Balancing: Distribute requests across servers
SSL/TLS Termination: Handle encryption/decryption TLS-Termination
Caching: Cache static content
Security: Filter malicious requests, hide infrastructure
Compression: Compress responses before sending to clients

Popular Tools

NGINX
HAProxy
Apache HTTP Server
Envoy

Distributed Messaging

Enables asynchronous communication between services in microservices architectures when immediate responses aren’t required.

Example: Sending order confirmation emails

How It Works

Services publish messages (e.g., order 123 confirmed) to queues/topics without knowing which service consumes them. This decouples services, allowing independent scaling.

Benefits

Resilience

If consumer service is down, producer continues publishing
Messages processed when consumer recovers

Asynchronous Processing

Producer doesn’t wait for consumer
Handles different processing speeds

Scalability

Add more consumers to handle load
Services scale independently

Popular Tools

Apache Kafka: High-throughput, distributed streaming
RabbitMQ: Feature-rich message broker
AWS SQS: Managed cloud queue service
Redis Streams: Lightweight pub/sub

Microservices

An architectural style that breaks large applications into smaller, independent services, each focused on a specific business capability.

Examples: Inventory service, user management service, payment service

Communication

Synchronous: REST APIs, gRPC Asynchronous: Message queues (Kafka, RabbitMQ)

Advantages

Modularity

Develop, deploy, and scale services independently
Changes to one service don’t require redeploying others

Fault Isolation

Bugs in one service don’t crash the entire application
Graceful degradation of functionality

Technology Diversity

Each service can use different tech stacks
Choose best tool for each job

Trade-offs

Notification Systems

Systems for sending alerts to users or other systems.

Types: Push notifications, emails, SMS Implementation: Often built using message queues internally Tools: Firebase Cloud Messaging, AWS SNS, Twilio

Full-Text Search

Specialized search engines for querying large text datasets.

Purpose: Fast, flexible search through product descriptions, articles, logs Advantage: Much faster than SQL LIKE queries Tools: Elasticsearch, Apache Solr, Algolia

Distributed Coordination Services

Critical for managing state and agreement in distributed systems.

Tools: Apache ZooKeeper, etcd, Consul

Key Functions

Service Discovery

Track which services are running and where

Distributed Locking

Ensure only one service performs critical operations
Prevent race conditions

Leader Election

Determine which server is primary/coordinator

Configuration Management

Centralized, reliable source of truth
Distribute config changes to all services

Heartbeats

Periodic health checks
Detect service failures
Trigger alerts or failover

Checksums

Digital fingerprints for data integrity
Verify data hasn’t been corrupted during transmission
Compare checksums before and after transfer

CAP Theorem

States that distributed systems can guarantee only two of three properties when network partitions occur:

C — Consistency

Every read gets the most recent write
All nodes see the same data simultaneously

A — Availability

Every request receives a response (success or failure)
System remains operational even with stale data

P — Partition Tolerance

System continues operating despite network failures
Some servers can’t communicate, but system functions

CP Systems (Consistency + Partition Tolerance)

Prioritize: Strong consistency

Trade-off: May become unavailable during partitions

Examples: Traditional RDBMS, MongoDB (with strong consistency settings), HBase

Use case: Financial transactions, inventory management

AP Systems (Availability + Partition Tolerance)

Prioritize: High availability

Trade-off: May serve stale data during partitions

Examples: Cassandra, DynamoDB, Couchbase

Concept: Eventual Consistency — all replicas converge to the same state once updates stop

Use case: Social media feeds, product catalogs, user profiles

PACELC Theorem

An extension of the CAP theorem that addresses what happens during normal operation (no partition). CAP only explains behavior during partitions, but PACELC provides a more complete picture.

PACELC stands for:

P (Partition): If there is a partition…
A (Availability) vs C (Consistency): choose between Availability or Consistency
E (Else): Otherwise (no partition)…
L (Latency) vs C (Consistency): choose between Latency or Consistency

Understanding PACELC

During Partition (PA/C):

Same as CAP theorem
Choose between Availability (serve possibly stale data) or Consistency (refuse requests)

During Normal Operation (EL/C):

Choose between Lower Latency (faster responses with relaxed consistency) or stronger Consistency (slower responses, wait for all replicas)

PACELC Categories

PA/EL Systems (Availability + Latency)

Partition: Prioritize Availability (AP)
Normal: Prioritize Latency (fast responses)
Trade-off: Eventual consistency
Examples: Cassandra, DynamoDB, Riak
Use case: Social media, content delivery, shopping carts

PA/EC Systems (Availability + Consistency)

Partition: Prioritize Availability (AP)
Normal: Prioritize Consistency (slower but consistent)
Examples: MongoDB (with eventual consistency mode)
Use case: Less common, but useful for systems that need consistency when stable

PC/EL Systems (Consistency + Latency)

Partition: Prioritize Consistency (CP)
Normal: Prioritize Latency (caching, read replicas)
Examples: Memcached, some caching systems
Use case: Systems that can tolerate unavailability during partitions but need speed normally

PC/EC Systems (Consistency + Consistency)

Partition: Prioritize Consistency (CP)
Normal: Prioritize Consistency (strong guarantees)
Trade-off: Higher latency
Examples: Traditional RDBMS (MySQL, PostgreSQL), HBase, VoltDB
Use case: Banking, financial transactions, inventory systems

Real-World Examples

Cassandra (PA/EL):

Partition: Remains available, accepts writes
Normal: Fast reads/writes with tunable consistency (eventual by default)
Result: High performance, eventually consistent

PostgreSQL (PC/EC):

Partition: May become unavailable to maintain consistency
Normal: Synchronous replication for strong consistency
Result: Strong guarantees, higher latency

DynamoDB (PA/EL by default):

Partition: Stays available
Normal: Low latency reads (eventually consistent reads by default)
Optional: Can request strongly consistent reads (PA/EC behavior) with higher latency

Failover

Automatic switching to redundant systems when primary components fail.

Examples:

Switch to standby database replica
Redirect traffic to healthy servers via load balancer
Promote backup to primary role

Goal: Minimize downtime and maintain service availability

Types:

Active-Passive: Standby idles until needed. There will be sets of servers which are kept in standby mode until needed.
Active-Active: Multiple systems handle load simultaneously. All the servers will handle all requests, if one goes down the load balancer will redirect the requests to other servers. complex consistency strategies are needed here.

Circuit Breaker Pattern

A fault tolerance mechanism that prevents applications from repeatedly attempting operations likely to fail, protecting systems from cascading failures.

Analogy: Like an electrical circuit breaker in your home—when something goes wrong, it “trips” to prevent further damage.

How It Works

The Circuit Breaker operates in three states:

1. Closed (Normal Operation)

All requests flow through normally
System monitors for failures (error rates, timeouts, response times)
Tracks failure metrics against threshold

2. Open (Failure Mode)

Threshold exceeded (e.g., 50% failure rate or 5 consecutive failures)
Circuit “trips open”
Requests fail fast without calling the downstream service
Prevents resource exhaustion (no blocked threads or connections)
Waits for timeout period (e.g., 30 seconds) before testing recovery

3. Half-Open (Testing Recovery)

After timeout expires, circuit enters half-open state
Allows limited test requests through (e.g., 3 requests)
If successful → return to Closed state
If still failing → return to Open state

Real-World Example

Scenario: E-commerce application calling Payment Service

Without Circuit Breaker:

User → Service A → Payment Service (slow/failing)
                 ↓
            Threads blocked
                 ↓
         Resource exhaustion
                 ↓
         Service A crashes too

With Circuit Breaker:

User → Service A → Circuit Breaker → Payment Service
                        ↓
                 After 5 failures
                        ↓
                  Circuit OPENS
                        ↓
              Service A stays healthy
              Returns fallback response

Key Benefits

Prevents Cascading Failures

Isolates failing services from the rest of the system
Stops the domino effect across microservices

Fail Fast

Immediate error response instead of waiting for timeouts
Better user experience (quick feedback vs hanging requests)

Resource Protection

Frees up threads, connections, and memory
Prevents thread pool exhaustion

Automatic Recovery Testing

Periodically checks if downstream service recovered
No manual intervention needed

Graceful Degradation

System continues operating with reduced functionality
Can return cached data or default responses

Configuration Parameters

Failure Threshold: How many failures trigger opening?

Example: 5 consecutive failures or 50% error rate within a time window

Timeout Period: How long to wait before testing recovery?

Example: 30 seconds, 1 minute
Gives downstream service time to recover

Success Threshold: How many successes in half-open to close?

Example: 3 consecutive successful requests

Request Volume Threshold: Minimum requests before calculating error rate

Example: Need at least 10 requests in window before opening circuit

Common Use Cases

External API Calls

Third-party payment gateways
Shipping APIs
Authentication services
Weather APIs

Database Connections

Protecting against database outages
Preventing connection pool exhaustion

Microservice Communication

Service-to-service calls
Preventing cascading failures across distributed systems

Fallback Strategies

When circuit is open, implement graceful degradation:

Cached Data: Return last known good value

1
return circuitBreaker.execute(
2
    () -> paymentService.getBalance(),
3
    fallback: () -> cache.getLastBalance()
4
);

Default Response: Return sensible default

1
return DEFAULT_SHIPPING_ESTIMATE; // "3-5 business days"

Queue Request: Store for later processing

1
messageQueue.enqueue(paymentRequest);
2
return "Your payment is being processed";

Alternative Service: Route to backup service

1
if (primaryCircuit.isOpen()) {
2
    return secondaryPaymentService.process();
3
}

User Message: Clear error explaining temporary unavailability

1
return "Payment service temporarily unavailable. Please try again in a moment.";

Implementation

Popular libraries:

Java: Resilience4j, Hystrix (maintenance mode)

.NET: Polly

Node.js: Opossum

Python: pybreaker

1
breaker = pybreaker.CircuitBreaker(
2
    fail_max=5,
3
    reset_timeout=30
4
)
5

6
@breaker
7
def call_payment_service():
8
    return payment_service.process()

Consistent Hashing

A technique for distributing data across servers that minimizes redistribution when servers are added or removed.

Problem with Simple Hashing:

Adding/removing one server → rehash all keys → massive data movement

Consistent Hashing Solution:

Only a small fraction of keys need rebalancing
Smoother scaling operations

Use Cases:

Distributed caches (Memcached, Redis clusters)
Distributed databases (Cassandra, DynamoDB)
Load balancing

API Idempotency

Ensures performing an operation multiple times has the same effect as performing it once.

Why It Matters: Network failures may cause automatic request retries. Without idempotency, duplicate requests could cause unintended effects.

Example: Payment Processing

User clicks “Pay”
Network hiccup causes retry
Without idempotency: double charge ❌
With idempotency: same result ✅

Implementation:

Use unique identifiers (transaction ID, idempotency key)
Server checks if request already processed
Return original response for duplicate requests

HTTP Method Idempotency:

Naturally Idempotent: GET, PUT, DELETE
Needs Design: POST (use idempotency keys)

Rate Limiting

Controls incoming request volume to prevent abuse and overload.

Benefits:

Prevents DDoS attacks
Ensures fair usage
Protects backend resources
Maintains service quality

Implementation Level: API Gateway or Load Balancer

Algorithms

Token Bucket

Bucket holds tokens (request capacity)
Tokens refill at constant rate
Request consumes token; rejected if bucket empty
Allows bursts up to bucket capacity

Leaky Bucket

Requests added to queue (bucket)
Processed at constant rate (leak)
Overflow requests dropped
Smooths traffic spikes

Fixed Window

Count requests in fixed time windows (e.g., per minute)
Reset counter at window boundary
Simple but can allow traffic spikes at boundaries

Sliding Window

Tracks requests over sliding time window
More accurate than fixed window
Prevents boundary spike issues

When rate limit is exceeded, return HTTP 429 Too Many Requests with these headers:

Standard Headers:

Retry-After: Seconds until client can retry (e.g., 60)
X-RateLimit-Limit: Maximum requests allowed in window (e.g., 100)
X-RateLimit-Remaining: Requests remaining in current window (e.g., 0)
X-RateLimit-Reset: Unix timestamp when limit resets (e.g., 1638360000)

Example Response:

HTTP/1.1 429 Too Many Requests
Retry-After: 60
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 0
X-RateLimit-Reset: 1638360000
Content-Type: application/json

{
  "error": "Rate limit exceeded",
  "message": "Too many requests. Please try again in 60 seconds."
}

Monitoring and Logging

Monitoring

Collects metrics about system health and performance.

Key Metrics:

CPU and memory usage
Request latency (P50, P95, P99)
Error rates and types
Throughput (requests/sec)
Database connection pool stats

Popular Stack:

Prometheus: Metrics collection and storage
Grafana: Visualization and dashboards
AlertManager: Alert routing and notification

Logging

Records events, errors, and system activities for debugging and auditing.

Log Levels: DEBUG, INFO, WARN, ERROR, FATAL

Structured Logging: Use JSON format for machine-readable logs

Popular Stack:

ELK Stack: Elasticsearch, Logstash, Kibana
EFK Stack: Elasticsearch, Fluentd, Kibana
OpenSearch: Open-source alternative to Elasticsearch

Principles Guiding Design Choices

Key principles that guide distributed system design:

Core Abilities

Scalability

Can the system grow to handle increased load?
Achieved through horizontal scaling, load balancing, partitioning

Maintainability

How easy to understand, modify, and operate over time?
Clear code, documentation, modular architecture

Efficiency

Optimal use of resources (CPU, memory, network, storage)
Cost-effectiveness at scale

Resilience

Gracefully handle failures
Design for failures, not against them

Measurement Metrics

Key metrics to evaluate system design:

Availability

Percentage of time system is operational
Measured in “nines”: 99.9% (“three nines”), 99.99% (“four nines”)
99.9% = ~8.76 hours downtime/year
99.99% = ~52.56 minutes downtime/year

Reliability

System consistently performs as expected without failure
Mean Time Between Failures (MTBF)

Fault Tolerance

System’s ability to continue operating despite component failures

Redundancy

Availability of backup components when needed

Throughput

Amount of work processed per unit time (requests/sec, transactions/sec)

Latency

Time to process a single request
Measured as percentiles: P50 (median), P95, P99

API Design Principles

Core Principles

Operation Clarity

Use clear, intent-revealing names over generic CRUD
Example: POST /orders/123/cancel vs DELETE /orders/123

Protocol Selection

HTTP/REST: Browser-based, public APIs
gRPC: High-performance internal services
GraphQL: Flexible client-driven queries

Data Formats

JSON: Human-readable, universal support
Protocol Buffers: Compact, efficient (gRPC)
XML: Legacy systems

Best Practices

Pagination

Limit result sets for performance
Patterns: offset/limit, cursor-based, page numbers

Filtering and Sorting

Allow clients to query specific data
Example: /users?status=active&sort=created_at

Idempotency

Ensure safe request retries
GET, PUT, DELETE are naturally idempotent
Add idempotency keys for POST

Versioning

Maintain backward compatibility
Strategies: URL versioning (/v1/users), header versioning

Security

Authentication (OAuth 2.0, JWT)
Rate limiting
Input validation
HTTPS only

Documentation

OpenAPI/Swagger specifications
Interactive API explorers
Code examples in multiple languages

Common Trade-offs

System design involves balancing competing concerns:

Consistency vs Availability

CAP Theorem: Choose 2 of 3 (Consistency, Availability, Partition Tolerance)

CP Systems: Strong consistency, possible downtime
AP Systems: High availability, eventual consistency

SQL vs NoSQL

SQL	NoSQL
Strong consistency	Flexibility
ACID guarantees	Horizontal scalability
Complex joins	Simpler data models
Structured schema	Schema-less

Caching Strategies

Strategy	Performance	Consistency	Complexity	Risk
Cache Aside	Medium	Medium	Low	Low
Write Through	Low (writes)	High	Medium	Low
Write Back	High	Low	High	Data loss
Write Around	Low (first read)	Medium	Low	Low

Microservices vs Monolith

Microservices:

✅ Modularity, independent scaling
❌ Operational complexity, distributed debugging

Monolith:

✅ Simple deployment, easier debugging
❌ Tight coupling, harder to scale

Security vs Usability

More Security → More friction for users Less Security → Better UX but higher risk

Balance: Multi-factor authentication, rate limiting, clear error messages

Performance vs Cost

Higher Performance → More expensive infrastructure Cost Optimization → Potential performance trade-offs

Balance: Caching, efficient algorithms, right-sized resources

Conclusion

The “right” system design depends entirely on your specific context:

Requirements: What must the system do?
Constraints: Budget, team size, timeline, regulations
Priorities: Performance, reliability, cost, time-to-market

Remember: Context is everything. A design that works for a startup with 1,000 users will differ vastly from one serving 100 million users.

Start simple, evolve as needed. Premature optimization is the root of much wasted effort.

HLD fundamentals

DNS

API

Types

REST (Representational State Transfer)

RPC (Remote Procedure Call)

GraphQL

Summary

Database

SQL (Relational DB)

NoSQL

Types

Summary

Key-Value Stores

Scalability

Approaches

Vertical Scaling (Scale Up)

Horizontal Scaling (Scale Out)

Load Balancers

Algorithms

High Availability

Tools

Caching

Caching Levels

Caching Strategies

Cache Aside (Lazy Loading)

Write Through

Write Back (Write Behind)

Write Around

Cache Eviction Policies

CDN (Content Delivery Network)

Data Partitioning and Sharding

Partitioning Strategies

Replication

Replication Strategies

Single Leader (Master-Slave)

Multi-Leader

Synchronization Modes

Reverse Proxy

Key Functions

Popular Tools

Distributed Messaging

How It Works

Benefits

Popular Tools

Microservices

Communication

Advantages

Trade-offs

Related Building Blocks and Operational Concepts

Notification Systems

Full-Text Search

Distributed Coordination Services

Key Functions

CAP Theorem

CP Systems (Consistency + Partition Tolerance)

AP Systems (Availability + Partition Tolerance)

PACELC Theorem

Understanding PACELC

PACELC Categories

Real-World Examples

Failover

Circuit Breaker Pattern

How It Works

Real-World Example

Key Benefits

Configuration Parameters

Common Use Cases

Fallback Strategies

Implementation

Consistent Hashing

API Idempotency

Rate Limiting

Algorithms

Monitoring and Logging

Monitoring

Logging

Principles Guiding Design Choices

Core Abilities

Measurement Metrics