⌛

Why rate limiting might not be enough for your app?

Today we are going to talk about some access control methods in applications. And how we can test our application to know the boundary and maximize the use of it.

Background

Last week, our application experienced production slowness due to a capacity planning issue. We rolled back the application and rechecked our capacity planning process. I thought through the methods and possible ways to find the best TPS settings for our applications during the process.

Key Metrics

To measure the performance of one application, TPS (Transaction per second) or RPS (Request per second) the application can handle is usually used to indicate the throughput of the application.

Request Pattern

But there’s another hidden factor. The request pattern of your traffic is also important factor to indicate your application’s performance. You can only measure performance under request pattern. For most application this request pattern is what you have gotten in the production log.

TPS

Let’s go back to TPS. So how is this metric composed? The following equation can be used to indicate how it’s calculated.
The equation can be turned into:
As a result we can visualize the following graph of average_response_time and connection_count indicating different TPS.
notion image
Inside this graph higher TPS means the slope is lower and line is closer to X axis while as lower TPS means the slope is higher.

How would you set the Rate Limiter?

Where is application’s max TPS?

Usually for a non-elastic system (meaning server count is fixed, if a serverless system then maybe fixed concurrency?), the performance can be represented with the blue line inside the following graph. After certain concurrency threshold, the request time starts to worsen quickly.
notion image
 
In theory, the MAX TPS line is the tangent line of Our sys performance graph from the original point.

So setting that TPS is good?

So can we set that value as the rate limit? No, that would definitely cause trouble. Let’s look at the following graph. Suppose your Ingress connection count is at the red vertical line. For Ingress rate-limiter, the request at the MAX TPS will be let through. But the system capacity is at much lower point when the connection count is at red vertical line’s value.
There will be a GAP between system capacity (Egress TPS) and ingress rate limit TPS, application will start to stack up and not able to process all the requests.
notion image

What is a valid TPS setting?

notion image
In the graph above, apparently, line 2 is a valid setting. It limits the ingress tps within application’s capacity. But there is a problem. Setting the the TPS rate limit allows all the ‘load’ points in green background being able to get through the gateway.
notion image
As for system’s capacity on the part closer to the y-axis (green labeled), this part is guaranteed to be performant, whereas the part labeled red is very dangerous to our application.

How to improve the setting?

We will need some other setting to limit the ingress count in the green area. This is where the connection limit comes in. Let’s have a look at the graph, the connection limit is the purple line. Only load on left side of the vertical line is allowed through the gateway, this makes it possible to contain the system’s performance within the green labeled area.
notion image

How to get those numbers?

Rate Limit

To get the tangent line of system’s performance curve, we need to increase the users to get a rough graph of the performance graph. Then draw a tangent line from original point and get the max TPS.
Or another way is increase the concurrency until TPS do not increase anymore, then we have the max TPS point at the turning point.
notion image
And then we can pick the actual TPS settings at 80% - 90% of the max value. After we fix the TPS, we can try to find the connection limit.

Connection Limit

By setting different concurrency and wait time between each request, we can use wait-time to control the TPS to be the same, and try sending request at the exact timeframe (I call it pulsing stress request). This will strictly test the system’s performance under simultaneous requests in a very short timeframe. Then we wll have a point where the performance is leaving from the TPS or the response time is beyond SLA of the service. Through this way we have a concrete connection limit and TPS rate limit established.
notion image

Conclusion

With the connection limit and TPS rate limit set, our services can finally run without any capacity issue. One thing to keep in mind is that too strict access control will waste your computing resource, we should always use our resources as much as possible.
Â