Back of the envelope calculations
Power of 2
Section titled “Power of 2”| Power | Approx | Memory Size | |
|---|---|---|---|
| ~1 thousand | 1 KB | ||
| ~1 million | 1 MB | ||
| ~1 billion | 1 GB | ||
| ~1 trillion | 1 TB | ||
| ~1 quadrillion | 1 PB |
Latency numbers
Section titled “Latency numbers”| Operation | Time | Description |
|---|---|---|
| L1 cache reference | 0.5 ns | CPU cache access |
| Branch mispredict | 5 ns | Pipeline stall |
| L2 cache reference | 7 ns | Slower CPU cache |
| Mutex lock/unlock | 25 ns | Thread synchronization |
| CPU Bound operation | Time | Description |
|---|---|---|
| Compress 1KB with Zippy | 3 µs | Fast compression |
| Memory Bound operation | Time | Description |
|---|---|---|
| Main memory reference | 100 ns | DRAM access |
| Read 1MB sequentially from memory | 9 µs | Sequential memory |
| IO Bound operation | Time | Description |
|---|---|---|
| Read 4KB randomly from SSD | 150 µs | Random SSD read |
| Read 1MB sequentially from SSD | 1 ms | Sequential SSD read |
| Disk seek | 10 ms | HDD seek time |
| Read 1MB sequentially from disk | 20 ms | HDD sequential read |
| Network Bound operation | Time | Description |
|---|---|---|
| Send 1KB over 1 Gbps network | 10 µs | Network transfer |
| Round trip within same datacenter | 500 µs | 0.5 ms |
| Send packet CA → Netherlands → CA | 150 ms | Intercontinental round trip |
Throughput numbers
Section titled “Throughput numbers”| System | QPS |
|---|---|
| MySQL | 1,000 |
| Key-value store | 10,000 |
| Cache server | 100,000–1M |
approximations can vary greatly, depending on a variety of reasons like actual query, specs of machine, DB design, indexing and current load on server.
Availability numbers
Section titled “Availability numbers”High availability is measured in “nines” - the percentage of uptime a system maintains.
| Availability % | Downtime per year (approx) |
|---|---|
| 90% (one nine) | ~1 month |
| 99% (two nines) | ~5 days |
| 99.9% (three nines) | ~10 hours |
| 99.99% (four nines) | ~1 hour |
| 99.999% (five nines) | ~5 minutes |
| 99.9999% (six nines) | ~30 seconds |
Example: Estimate Twitter QPS and storage requirements
Section titled “Example: Estimate Twitter QPS and storage requirements”Please note the following numbers are for this exercise only as they are not real numbers from Twitter.
Assumptions:
- 300 million monthly active users.
- 50% of users use Twitter daily.
- Users post 2 tweets per day on average.
- 10% of tweets contain media.
- Data is stored for 5 years.
Estimations: Query per second (QPS) estimate:
- Daily active users (DAU) = 300 million * 50% = 150 million
- Tweets QPS = 150 million * 2 tweets / 24 hour / 3600 seconds = ~3500
- Peek QPS = 2 * QPS = ~7000
We will only estimate media storage here.
- Average tweet size:
- tweet_id 64 bytes
- text 140 bytes
- media 1 MB
- Media storage: 150 million * 2 * 10% * 1 MB = 30 TB per day
- 5-year media storage: 30 TB * 365 * 5 = ~55 PB
Request estimation
Section titled “Request estimation”Number of requests a typical server can handle in a second.
CPU clock rate:
CPU time per program:
CPU time per clock cycle:
Given:
CPU time per program:
Total requests a CPU executes in 1 second:
Total requests a 64-core server executes in 1 second:
Example: Number of servers required Twitter
Section titled “Example: Number of servers required Twitter”Assumptions
- There are
500 million(M) daily active users (DAU). - A single user makes
20 requests per dayon average. - We know that a single server (with
64 cores) can handle64000 RPS.
Numbers of requests in a day: =
Number of Requests per second:
Section titled “Number of Requests per second:”Total requests per second:
Number of servers required:
Section titled “Number of servers required:”Number of servers required:
Peak capacity:
Section titled “Peak capacity:”To better calculate the peak capacity, we need request and response distributions — a statistics distribution telling the timing and frequency of requests throughout the entire day.
If not available, we can just use the DAU as a proxy for the peak users.
Servers needed at peak load = = =
157K is an astronomical amount, that might not even be feasible. to reduce this amount, we have two potential paths.
- Improving the RPS of a server - though possible but require extensive engineering. We can bump 64k RPS server capacity to 100k.
- Improving over the peak load assumption - this is to improve our assumption, we can make use of pareto principle of 80/20 rule.
in our case we can assume that 80% of the peak traffic occurs within 20%(5hrs) of time.
Number of requests/second =
Number of servers =
Cost of servers
Section titled “Cost of servers”EC2 instance type called m7i.16xlarge with a 64-core processor and 256 GB of RAM to get a handle on the dollar cost of servers. The cost of one such instance is with a 1-year contract plan.
Low bound server cost per hour = 2*7.096 Cost under 80–20 assumptions per hour = 8*28.38 Peak load cost per hour = 157K*557,061
Storage estimations
Section titled “Storage estimations”Assumptions:
- We have a total of
500Mdaily active users. - Each user posts
3 tweetsper day. 10%of tweets contain images,5%contain videos (mutually exclusive).- Average image size:
200 KB - Average video size:
3 MB - Tweet text + metadata:
250 bytes
Calculations:
Total tweets per day:
Storage for tweet text:
Storage for images:
Storage for videos:
Total storage per day:
Storage required for one year: