Numerous articles and books have been written about scalability and capacity planning. However, sometimes the hardest part of the job is to avoid overengineering and keep relevant concepts as clear and simple as possible. As a Ruby and Go backend developer, I’ve discovered through experience a number of important scalability principles that can aid in the construction of a wide variety of software projects. In fact, these basic rules can be applied to numerous architectural challenges with successful results.
Scalability vs. Performance
Let’s start with some sad news. Scalable code is usually computationally inefficient. If you split a big, complex task to a set of small associative operations, the unfortunate reality is that it will probably run slower on one machine. But this approach is still preferable to support unlimited horizontal scaling. Obviously, hitting the ceiling of vertical growth is much more likely.
Moreover, scaling lets you avoid possible points of failure and reach reliability via redundancy. So swarms of small and simple tasks definitely overcome their efficient and complex alternatives.
If a web server processes a request in 50ms, the fact is that two servers won’t necessarily do it twice as fast. We can improve the overall throughput by spawning more instances, but the latency will reach its lower bound as soon as we upstage the queuing issues. Good developers tend to know thousands of ways to overcome this flaw. But before you search for alternatives, you can postpone the actions that don’t need an immediate response.
Moreover, pumping all the possible actions through a message queue lets you reach the dizzy heights of seamless integration, smoothing the load peaks and calmly handling partial outages. If some service is down (planned or unplanned), the requests stack up in the queue to be addressed when it returns.
Asynchronous details look nice on paper, but if you need a DB query to present some data to a user, sending it to RabbitMQ won’t save any time. However, if your controller needs two independent queries and each of them takes 100ms, launching them concurrently will probably cut the response time in half. The same principle works for any other blocking operation.
Concurrent queries reduce the overall latency unless the DB server starts sweating. Usually, it tends to be the key bottleneck. But, fortunately enough, the clients prefer reading over writing. At least you don’t need to write anything to authenticate a user, display a page and load the comments. Therefore, most of the essential flows can be served from a read replica, even if the main database is down. And you can launch as many read replicas as you want. If it is not sufficient, feel free to launch replicas of replicas to correct this issue.
For the same reason, it makes sense to store most of the session-related data on the client side. This approach lets different servers handle requests for the same client and eliminates the notorious locks on the sessions table unless you overcomplicate the process and run into client-side data migrations.
But what about replication lag? Welcome to the cruel world of eventual consistency. Some essential survival skills include:
- When you change an object and trigger some remote asynchronous action, don’t expect the changes to arrive before the recipient executes it. Rather, serialize the object and send it as an argument. This is a sad, but a true reality.
- Train some virtual dwarfs to check data for consistency, sweep garbage and kill zombies in the background.
- Use two-phase commit for transactions between microservices. It’s usually easier to validate the data before performing an action than to serve a distributed rollback.
- Keep the transaction logs to retrospect discrepancies. It’s also useful to leverage the power of conflict-free replicated data types (CRDT). For instance, if you have a counter that faces some parallel increment/decrement operations, track the number of increments and decrements in separate atomic counters. The resulting value will not depend on the order of operations.
The third normal form is beautiful, but computations these days are much more expensive than the HDD space. And it’s not about electricity. Instead, it is more about latency. If you can save 10ms of an average response time by storing 1TB of additional data, this means it’s usually a good idea to proceed. Our enemy is the seek time. However, it can be defeated by proper indexing and partitioning.
Caching might be tricky when data comes from different services. If you ever need to cache some information owned by another service, then it should emit a “change” event to let you invalidate the corresponding cache entries asynchronously.
Obviously, it is vital to measure cache hits/misses and LRU age. And sometimes it makes sense to separate local and global cache.
Competing on a rapidly growing market is much like racing. If you keep everything under control, it means you aren’t pushing hard enough. Failures happen. But the most important thing is to minimize their overall impact and always have an option to choose from during these circumstances.
In addition, it may sound counterintuitive, but the fact is that every failure should occur as fast as possible. With this method in place, you can isolate any issues and prevent them from spreading by grabbing shared resources.
However, be careful with retrying non-idempotent operations. If you do not receive a response, it does not guarantee that the action had not been taken.
Also, never trust external services. If you depend on five services and each of them has 99 percent SLA, you can’t guarantee more than 0,995 equals 95 percent of availability. In short, this means you could be offline for 18 days a year.
Reproducing a tricky bug is usually harder than fixing it. The most severe outages happen when we don’t have enough evidence to understand the logic behind certain occurrences. Therefore, it is important to retrospect the incidents and improve monitoring. This method enables us to get the clues faster and separate the reasons from the consequences. Still, it should not lead to the endless noise of meaningless alerts.
In general, it is very useful to have a tool to get all the logs for a particular request from multiple servers and microservices.
In addition, it obviously makes sense to track the basic metrics (Apdex score, throughput, response time etc.) from all the services on the single dashboard. That way, the data is centralized and much easier to manage.
Imagine you suddenly got a queue of 1000 requests that should be processed by 20 workers. An average request takes 100ms and you have a circuit breaker with a 500ms timeout. Unfortunately, this means that 900 requests will fail with such a timeout in place.
Therefore, sometimes it’s easier to adjust the circuit breaker settings than to balance the capacities in realtime. In general, though, you should have enough evidence to track down bottlenecks and adjust the auto-scaling accordingly.
It is worth mentioning that adding more resources to a group might have a negative impact due to resource contention. As a rule, it’s good to know the optimal ratios.
Small servers are good for smooth capacity curves and precise scaling, whereas big servers are more efficient in terms of load balancing, monitoring and latency for heavy computations. Every service has its own optimal setup and it makes no sense to pay for better CPU if you are bound to network or SSD performance.
Obviously, spawning a new server should be a trivial, automated operation. Killing a server should not have any negative impact. In basic terms, you should not lose any data (including logs). It is relatively simple to achieve this with a “share-nothing” approach. The logs could be sent to a central location in realtime. It is also good to automate the routine operations like deployment (without any SSH!) and to make the configuration as dynamic as possible to avoid frequent restarts.
These recipes should not expect the servers of one group to have the same configuration. Furthermore, they must not stick to parameters like CPU count or amount of RAM. But to be safe, you might need to spawn a new server with a different configuration in case of an emergency.
Deployment is usually easier than a rollback, but there should always be a plan B. First of all, avoid irreversible migrations and massive changes that cause downtime. The old code should keep working after applying the new migrations. If it is not possible, then break the change apart and split it to several releases.
Feature flags are also very useful to enable features one by one for a particular group of users. Keep them in configuration to disable a particular feature and partially degrade the service, if something suddenly goes wrong.
Needless to say, it should be possible to deploy or rollback some services separately. But keep in mind that this article is not about continuous delivery, microservices or service oriented architectures,.Instead, it is more concerned with how the system is expected to consist of some independent blocks with well-defined interconnections and separate release cycles. So if you change some API on one side, don’t forget to provide the fallback unless all the dependencies get updated. But this step should not lead to a hail of “fallback chains.” To prevent such a result, just remember to remove the deprecated stuff and keep the “fallback window” as short as possible (according to the release cycle).
Sustaining growth is challenging, especially when you have to balance between delivering new features and collecting technical debt, which is twice as hard on a highly-competitive market. The most important rule for us is to break complex things into small, simple and predictable pieces. Fortunately, the landscape of available technologies and tools evolve faster than our ability to discover, learn and adapt them. Therefore, the biggest honor for us is to contribute to the global ecosystem and keep running on the edge of the technology frontier, satisfying the demands and expectations of our customers and stakeholders in the process.
- Sam Newman, “Building Microservices”, O’Reilly Media, 2015.
- John Allspaw, “The Art of Capacity Planning”, O’Reilly Media, 2008.
- Cal Henderson, “Building Scalable Web Sites”, O’Reilly Media, 2006.