BackgroundKey MetricsRequest PatternTPSHow would you set the Rate Limiter?Where is applicationâs max TPS?So setting that TPS is good?What is a valid TPS setting?How to improve the setting?How to get those numbers?Rate LimitConnection LimitConclusion
Today we are going to talk about some access control methods in applications. And how we can test our application to know the boundary and maximize the use of it.
Background
Last week, our application experienced production slowness due to a capacity planning issue. We rolled back the application and rechecked our capacity planning process. I thought through the methods and possible ways to find the best TPS settings for our applications during the process.
Key Metrics
To measure the performance of one application,
TPS
(Transaction per second) or RPS
(Request per second) the application can handle is usually used to indicate the throughput of the application.Request Pattern
But thereâs another hidden factor. The
request pattern
of your traffic is also important factor to indicate your applicationâs performance. You can only measure performance under request pattern
. For most application this request pattern
is what you have gotten in the production log.TPS
Letâs go back to TPS. So how is this metric composed? The following equation can be used to indicate how itâs calculated.
The equation can be turned into:
As a result we can visualize the following graph of
average_response_time
and connection_count
indicating different TPS
.Inside this graph higher
TPS
means the slope is lower and line is closer to X
axis while as lower TPS
means the slope is higher.How would you set the Rate Limiter?
Where is applicationâs max TPS?
Usually for a non-elastic system (meaning server count is fixed, if a serverless system then maybe fixed concurrency?), the performance can be represented with the blue line inside the following graph. After certain concurrency threshold, the request time starts to worsen quickly.
Â
In theory, the MAX TPS line is the tangent line of Our sys performance graph from the original point.
So setting that TPS is good?
So can we set that value as the rate limit? No, that would definitely cause trouble. Letâs look at the following graph. Suppose your Ingress connection count is at the red vertical line. For Ingress rate-limiter, the request at the MAX TPS will be let through. But the system capacity is at much lower point when the connection count is at red vertical lineâs value.
There will be a GAP between system capacity (Egress TPS) and ingress rate limit TPS, application will start to stack up and not able to process all the requests.
What is a valid TPS setting?
In the graph above, apparently, line
2
is a valid setting. It limits the ingress tps within applicationâs capacity. But there is a problem. Setting the the TPS rate limit allows all the âloadâ points in green background being able to get through the gateway.As for systemâs capacity on the part closer to the y-axis (green labeled), this part is guaranteed to be performant, whereas the part labeled red is very dangerous to our application.
How to improve the setting?
We will need some other setting to limit the ingress count in the green area. This is where the connection limit comes in. Letâs have a look at the graph, the connection limit is the purple line. Only load on left side of the vertical line is allowed through the gateway, this makes it possible to contain the systemâs performance within the green labeled area.
How to get those numbers?
Rate Limit
To get the tangent line of systemâs performance curve, we need to increase the users to get a rough graph of the performance graph. Then draw a tangent line from original point and get the max TPS.
Or another way is increase the concurrency until TPS do not increase anymore, then we have the max TPS point at the turning point.
And then we can pick the actual TPS settings at
80% - 90%
of the max value. After we fix the TPS, we can try to find the connection limit. Connection Limit
By setting different concurrency and wait time between each request, we can use wait-time to control the TPS to be the same, and try sending request at the exact timeframe (I call it pulsing stress request). This will strictly test the systemâs performance under simultaneous requests in a very short timeframe. Then we wll have a point where the performance is leaving from the TPS or the response time is beyond SLA of the service. Through this way we have a concrete connection limit and TPS rate limit established.
Conclusion
With the connection limit and TPS rate limit set, our services can finally run without any capacity issue. One thing to keep in mind is that too strict access control will waste your computing resource, we should always use our resources as much as possible.
Â