Algorithms to Design a Rate Limiter API
Several algorithms are used for rate limiting, including
- The Token bucket,
- Leaky bucket,
- Sliding window logs, and
- Sliding window counters.
Let’s discuss each algorithm in detail:
Token Bucket
The token bucket algorithm is a simple algorithm that uses a fixed-size token bucket to limit the rate of incoming requests. The token bucket is filled with tokens at a fixed rate, and each request requires a token to be processed. If the bucket is empty, the request is rejected.
The token bucket algorithm can be implemented using the following steps:
- Initialize the token bucket with a fixed number of tokens.
- For each request, remove a token from the bucket.
- If there are no tokens left in the bucket, reject the request.
- Add tokens to the bucket at a fixed rate.
Thus, by allocating a bucket with a predetermined number of tokens for each user, we are successfully limiting the number of requests per user per time unit. When the counter of tokens comes down to 0 for a certain user, we know that he or she has reached the maximum amount of requests in a particular timeframe. The bucket will be auto-refilled whenever the new timeframe starts.
Leaky Bucket
It is based on the idea that if the average rate at which water is poured exceeds the rate at which the bucket leaks, the bucket will overflow.
The leaky bucket algorithm is similar to the token bucket algorithm, but instead of using a fixed-size token bucket, it uses a leaky bucket that empties at a fixed rate. Each incoming request adds to the bucket’s depth, and if the bucket overflows, the request is rejected.
One way to implement this is using a queue, which corresponds to the bucket that will contain the incoming requests. Whenever a new request is made, it is added to the queue’s end. If the queue is full at any time, then the additional requests are discarded.
The leaky bucket algorithm can be separated into the following concepts:
- Initialize the leaky bucket with a fixed depth and a rate at which it leaks.
- For each request, add to the bucket’s depth.
- If the bucket’s depth exceeds its capacity, reject the request.
- Leak the bucket at a fixed rate.
Sliding Window Logs
Another approach to rate limiting is to use sliding window logs. This data structure involves a “window” of fixed size that slides along a timeline of events, storing information about the events that fall within the window at any given time.
The window can be thought of as a buffer of limited size that holds the most recent events or changes that have occurred. As new events or changes occur, they are added to the buffer, and old events that fall outside of the window are removed. This ensures that the buffer stays within its fixed size, and only contains the most recent events.
This rate limitation keeps track of each client’s request in a time-stamped log. These logs are normally stored in a time-sorted hash set or table.
The sliding window logs algorithm can be implemented using the following steps:
- A time-sorted queue or hash table of timestamps within the time range of the most recent window is maintained for each client making the requests.
- When a certain length of the queue is reached or after a certain number of minutes, whenever a new request comes, a check is done for any timestamps older than the current window time.
- The queue is updated with new timestamp of incoming request and if number of elements in queue does not exceed the authorised count, it is proceeded otherwise an exception is triggered.
Sliding Window Counters
The sliding window counter algorithm is an optimization over sliding window logs. As we can see in the previous approach, memory usage is high. For example, to manage numerous users or huge window timeframes, all the request timestamps must be kept for a window time, which eventually uses a huge amount of memory. Also, removing numerous timestamps older than a particular timeframe means high complexity of time as well.
To reduce surges of traffic, this algorithm accounts for a weighted value of the previous window’s request based on timeframe. If we have a one-minute rate limit, we can record the counter for each second and calculate the sum of all counters in the previous minute whenever we get a new request to determine the throttling limit.
The sliding window counters can be separated into the following concepts:
- Remove all counters which are more than 1 minute old.
- If a request comes which falls in the current bucket, the counter is increased.
- If a request comes when the current bucket has reached it’s throat limit, the request is blocked.
How to Design a Rate Limiter API | Learn System Design
A Rate Limiter API is a tool that developers can use to define rules that specify how many requests can be made in a given time period and what actions should be taken when these limits are exceeded.
Rate limiting is an essential technique used in software systems to control the rate of incoming requests. It helps to prevent the overloading of servers by limiting the number of requests that can be made in a given time frame.
It helps to prevent a high volume of requests from overwhelming a server or API. Here is a basic design for a rate limiter API In this article, we will discuss the design of a rate limiter API, including its requirements, high-level design, and algorithms used for rate limiting.