Many applications start out with a good idea, and even a good architecture for demonstrating that idea or serving hundreds to thousands of clients. If it’s popular though, you’ll soon have to adopt horizontal scaling ideas.
One of the best ways to scale is to separate Database reads and writes. Sharding is a great way to add additional capacity by placing new read or write shards into load balancing rotation, or geographically closer to clients. Read shards can often replicate in less than realtime, and don’t really present a challenge, except as data grows and you outgrow the alloted storage. As Read or write data gets above a certain size, queries slow, and upgrading all those disks in all those shard servers is expensive and difficult. Also, if writes are only stored in specific write shards (as opposed to replicated network wide). It is difficult to report on business data, and long term, it’s generally not a good idea to store all that historical data in production anyway.
It’s therefore critical to create data aggregation processes to compile and store data important to business intelligence. It’s crucial to note that not all data needs to be siphoned off and stored long term. It’s equally important to decide what data you don’t need to keep and make sure it gets purged periodically from your production shards.
I’ll talk more later about effective ways to do this.