$ ~/back-of-the-envelope
[005] storage

How much storage does a Google-scale web crawler need per month?

hard storagecapacity-planningweb-crawlergoogle-scale

The scenario

You’re designing a Google-scale web crawler. The crawler maintains an index of 1 billion web pages. To keep search results fresh, pages are re-crawled on average 4 times per month — so you’re fetching and storing roughly 4 billion pages per month.

The average stored page (HTML, compressed) is 500 KB.

How much raw storage does the crawler need to provision per month?

Adapted from: donnemartin/system-design-primer — Design a Web CrawlerCC BY 4.0

$answer:PB/month