We build really big softwareBig as in 100,000-people-participating-in-the-same-web-conference big or alert-an-entire-state-about-severe-weather big. But few software systems start out that big. Twitter and Netflix had big ideas for sure, but the technology started small. And they were able to smoothly grow to keep up with demand… well, perhaps not also smoothly! So when’s the right time to start worrying about your scalability strategy?
Big as in 100,000-people-participating-in-the-same-web-conference big or alert-an-entire-state-about-severe-weather big. But few software systems start out that big. Twitter and Netflix had big ideas for sure, but the technology started small. And they were able to smoothly grow to keep up with demand… well, perhaps not also smoothly! So when’s the right time to start worrying about your scalability strategy?
We get asked that question a lot, and the answer is early and often. In fact, it’s one of the first questions we ask when evaluating the feasibility of a project. While most technology businesses will never need to address Twitter or Netflix sized scalability challenges, we believe that if you are planning for success, you need to be planning for scalability.
The geek-speak of scalability usually starts with heaps of technology: “We are building an on demand, auto-scaling, geo-clustered solution on asynchronous, multi-core, parallel processing architectures deployed to a cloud based, virtualized, high IOPS, big-data clustered pay-as-you-go infrastructure.” So it’s just about picking the most scalable technology, right? Wrong! While the technology is certainly critical when building for scale, it’s rarely the limiting factor. With a few exceptions, you can build scalable solutions on most any platform. The important thing is to think about scalability in every decision you make.
This assessment includes: servers, storage, applications, databases, bandwidth, IO, and especially the services your solution depends on. While doing this, most engineers tend to worry first about performance. Truth is, regarding scalability; the first focus needs to be reliability. Building for reliability not only makes the stack more robust, it lays the foundation for scalability.
The easiest way to change this mind set is by asking reliability questions like:
Designing for reliability in these situations means we probably have a least two of everything — and we are on our way to N of everything, which is the hallmark of scalability. But don’t be tempted by tactical solutions. Fire anyone who suggests, “We’ll just get a bigger one when we need it.” They are not thinking strategically — scaling up is not a strategy!
There are a number of key architectural themes and design patterns that we rely on to pave the way for scalability. The essentials are pretty common “best practices” that most software teams should understand:
We’ve been practicing this approach for many years now and have yet to be disappointed in the results. As an example, we are operating a fairly complex JSON API that scales well beyond 4000 transactions per second per node on inexpensive virtual machines in the Amazon EC2 cloud. As a frame of reference, we recently consulted with a client whose system required 90 (yes, nine-zero) machines for that throughput in the very same cloud!
So let’s review. Design for scale from the get-go: check. Get the team in the mindset of reliability: check. Use the right technology in the right way: check. But that’s just the tip of the iceberg. The reality is that all of these things can go right, and scalability challenges will still present themselves. There are a myriad of less technical things that have to go right to be successful with large-scale software systems. Starting with development and operational processes to testing, monitoring, and support systems… the list goes on and on. In a future post, we will hit some of these areas head on and discuss how we approach them in real-world super-scalable solutions.