Writing scalable applications? Here are three tips to do it right.
At AppLovin, we handle over 50 billion ad requests and 150 terabytes a day, across seven global data centers, so one of our biggest challenges is scaling to meet that volume of data. If you’re working with masses of data that will scale, here are a few tips that might be useful:
1) Plan to scale horizontally when possible.
For any service that you expect will handle a growing volume of data, horizontal scalability is key. Scaling vertically can be expensive and prohibitive because higher-end hardware typically costs more, and there is always a limit on server power. Additionally, scaling horizontally works great when it comes to hardware failures and outages as the other servers can pick up the slack.
At AppLovin, we chose a microservice architecture to achieve horizontal scalability. Each service has a specific role and is scaled independently of the other components. When we do experience a bottleneck or performance issue, the microservice architecture makes it trivial to identify exactly where the problem lies and easy to weigh the options regarding how to proceed. Because the components are all horizontally scalable, adding servers is always a viable option; adding servers is dev-ops’ go-to move because it doesn’t require engineering intervention. It’s only worth spending engineering time to fix the problem through optimization (or, in the worst case, redesign) if the cost of adding more servers outweighs the value of the service, or if we believe we are wasting money due to inefficiencies.
2) Take advantage of a powerful pub/sub system for all internal communication.
Our pub/sub system is at the core of everything in the AppLovin architecture. The pub/sub system is effectively the glue that holds everything together. When applications communicate directly to each other it can quickly become a headache, and there are lots of things to consider such as whether the service you’re talking to is up, whether it is consuming at the same rate data is being produced, what happens when the service dies while processing the message, and most importantly, the logistics of communicating data between data centers (especially in the presence of network issues).
Having a centralized, reliable and fault tolerant pub/sub system can solve most of these problems. At AppLovin we use our pub/sub system for all internal service communication regardless of whether it’s between data centers, services or even within the same service. Our pub/sub system is resilient to data backups and failures and stores all of our intermediary data in a fault tolerant manner so we know it’s always safe. It enables producers, consumers or even pub/sub servers to go down without any consequences. Crucially, it handles the backlog when services go down or fall behind. It even centralizes this information so it’s easy to see if each and every service and task is keeping up with the volume.
3) Monitor, monitor, and then monitor some more.
As the scale increases and the number of components rise within an enterprise, it becomes increasingly difficult to determine the cause of a problem; without proper alerting and monitoring, it can be tough to figure out if there is a problem in the first place. From the get-go, any company that deals with large amounts of data needs a good framework for monitoring in order to maintain stability while scaling up. Within our organization, there are alerts for nearly every scenario and graphs that track every seemingly useful trait. Even though we don’t necessarily monitor them all, when a problem arises it can be invaluable to have that historical information so we can make correlations after the fact. The most important graphs and metrics are posted on large monitors for everyone to see and reference. These enable us to catch problems well before they become an issue for the business.
In my experience at AppLovin, scaling horizontally, using a powerful pub/sub system, and monitoring everything are crucial to managing and leveraging large amounts of data. Follow these general best practices and you will be well-prepared to scale.