How I Reduced the Thundering Herd
Flask-Caching with Probabilistic Renewal
A cache stores various types of data so that the data can be served faster rather than retrieved from the distant source or recomputed every time it is required.
You may cache, for example, resource-intensive functions, endpoint functions or anything else to avoid repeated computation and to improve the responsiveness of your web page.
As a result, pages can be served to the users more faster, and the server can stop overusing resources and operate reliably.
Problem
A web server should handle multiple requests simultaneously. When a user requests a page, there are also other users requesting the same or another pages. In this situation, caches might not work as expected.
Let me explain my experience.
One of my jobs at my previous company was to develop a social community website. During Korean National College Entrance Exam period, almost 10,000 posts are uploaded there a day and even more comments than that a day.
It has a small section visible on all pages that shows the top 5 most commented posts. All users request that section every time he/she visits, and that means, the function computing the rank should be executed on almost every request.
Fortunately, because a little delay in reflecting the rank is acceptable, and unfortunately because computing it involves heavy database query, there is a cache for computing the rank with 10 minutes of TTL.
When the time comes, the cache expires.
The first request after the expiration will encounter a cache miss, so it will call the computing function and will cache the returned value, the rank.
But what if it takes 10 seconds to finish the function and to populate the cached value? Let's see what happens.
The first request of a user calls the function to compute the rank. At the same time, the second request of another user also does. Until any request so far finishes the function and succeeds in caching, the third, fourth and others do in the same way.
Consequently, even though we have cached the function and expected it to be executed only once whenever expiration occurs, it is executed multiple times by different requests at the same moment. It is called a "Thundering Herd" problem.
In this situation, requests unexpectedly bypassing the cache eventually make database load increase. As the load increases, the response time from the database may increase, and the more hanging requests, the more chances of the server outage.
Actually, during the exam season, my sole database had been overwhelmed by the heavy query load whenever the cache expires, and eventually there were periodical outages until the traffic became stabilized.
Workaround for this
The caching methodology above is called Lazy Caching. By using Lazy Caching, when your server receives a request from a user, it checks whether the data is cached. If so, it uses the data right away, but if not, it computes and caches the value again and returns it to the user.
Alternatively, you can use a Write-through cache. When the original data having been cached is changed in the database, you also reflect the change to the cache. TTL is not required because the cached data is always up-to-date.
However, there are cases where it is difficult to implement. Especially my case was. So I decided to make a slight change to the caching package for a workaround.
Flask-caching
Flask is a lightweight WSGI web application framework for python, and one of the most loved packages related to Flask is Flask-Caching.
I won't talk about what is Flask and how to use it, but will take a brief explanation about Flask-Caching (v1.10.1)
What you need to do is to create a Cache
object and a decorator @cache.cached(...)
for the endpoint function.
When you visit 'localhost:5000/'(default server URL), it shows the DateTime you visit (e.g. "Fri Jan 21 19:07:24 2022"), and when you re-visit before timeout, the server shows the same DateTime.
That means, your second request does not reach the endpoint function, but the cache returns the value previously computed by the first request.
That's all the cache does.
You also have to note that cache.cached(...)
does not consider parameters passed to the function. For example,
The results of add
function are the same although passed are different parameters.
In this case, cache.memoize(...)
works.
Now add(3, 4)
returns right value.
Probabilistic Renewal
As I said, unexpected requests can reach into the function at the moment the expiration occurs. I will show you the modified version of cached
method for a workaround. You can modify the memoize
method in the same way if necessary.
The original code is here. To put it simply, the cached
method executes the following steps.
- If
unless
(callable argument) returns True, decorator returnsf(*args, **kwargs)
directly ignoring cache-related logic. - The decorator function makes a cache key using the args and kwargs passed to the decorated function and request information (URL path, querystring, ...).
- If
force_update
(callable argument) is notNone
and returns True, it sets a cache miss flag on.
Otherwise, it retrieves the cached data from the cache storage. In this case, as well, it sets the cache miss flag if there is nothing cached. - When the cache miss flag is set, it executes the original function
f
and populates(sets) the returned data to the cache storage. - Finally, it returns the cached data regardless of whether it was just populated or not.
Straightforward, right?
Now, what we want to do is to force the cached data to be updated conditionally before expiration occurs. It would be great if the update(renewal) occurs when the expiration is imminent.
First of all, we add two arguments to the cached
method.
renewal_prob
specifies in what percent chance of renewal could happen. 0.01 means 1%.
renewal_start
is to calculate renewal_time_left_thres
variable described below.
If at least one of them is set None
, the renewal will not happen.
And we need to set two variables and two function attributes.
Although timeout
could be None
and will be replaced by default value at the appropriate time, we should replace it explicitly to calculate renewal_time_left_thres
.
It is used to set the threshold of expiration time left when to start to try renewing. For example, assuming that it is 10 seconds and the expiration timeout is 11 seconds left, there is no chance of the renewal. When 10 seconds or less are left, the renewal could happen.
expired_at
stores the expiration timestamp of each cache key.
in_progress
is a flag to check whether the renewal is in progress by another thread.
Now, we will add more logic into decorated_function
.
Using cache_key
, we can get the expiration timestamp from decorated_function.expired_at
and the in-progress status from decorated_function.in_progress
.
We want to set 'forcing update' flags on when ...
renewal_prob
is notNone
expiration timestamp - time.time()
is less thanrenewal_time_left_thres
- The renewal is not in progress
random.random()
is less thanrenewal_prob
The first condition is to maintain backward compatibility, and the second is intended to limit the period in which the renewal could occur. The third is to keep the renewal from being executed in more than one thread in a process at the same moment, and the last is necessary to decrease the chances of renewal when handling multiple requests.
When the renewal gets its turn, the lines in the if statement are executed. Consequently, the in-progress flag is set, rv
is set as None
, and found
is False
.
Because found
is False
, the original function, f
, is called and the return value is populated into the cache storage. The expiration timestamp is recalculated, and the in-progress flag is set off.
Finally, the decorated_function
returns the cached data (rv
) as a result of calling f
function.
Here is the code
※ I removed docstrings and comments for readability."""
:copyright: (c) 2010 by Thadeus Burgess.
:license: BSD, see LICENSE for more details.
See original code from here: https://github.com/pallets-eco/flask-caching/blob/348dbecf1365128799f99cb4d751f212089bb218/src/flask_caching/__init__.py
"""
def cached(
self,
timeout: Optional[int] = None,
key_prefix: str = "view/%s",
unless: Optional[Callable] = None,
forced_update: Optional[Callable] = None,
response_filter: Optional[Callable] = None,
query_string: bool = False,
hash_method: Callable = hashlib.md5,
cache_none: bool = False,
make_cache_key: Optional[Callable] = None,
source_check: Optional[bool] = None,
renewal_prob: Optional[int] = None,
renewal_start: Optional[int] = None,
) -> Callable:
def decorator(f):
@functools.wraps(f)
def decorated_function(*args, **kwargs):
#: Bypass the cache entirely.
if self._bypass_cache(unless, f, *args, **kwargs):
return f(*args, **kwargs)
nonlocal source_check
if source_check is None:
source_check = self.source_check
cache_key = None
try:
if make_cache_key is not None and callable(make_cache_key):
cache_key = make_cache_key(*args, **kwargs)
else:
cache_key = _make_cache_key(args, kwargs, use_request=True)
if (
renewal_prob is not None
and decorated_function.expired_at[cache_key] - time.time() < renewal_time_left_thres
and not decorated_function.in_progress[cache_key]
and random.random() < renewal_prob
) or (
callable(forced_update)
and (
forced_update(*args, **kwargs)
if wants_args(forced_update)
else forced_update()
)
is True
):
decorated_function.in_progress[cache_key] = True
rv = None
found = False
else:
rv = self.cache.get(cache_key)
found = True
if rv is None:
if not cache_none:
found = False
else:
found = self.cache.has(cache_key)
except Exception:
if cache_key:
decorated_function.in_progress[cache_key] = False
if self.app.debug:
raise
logger.exception("Exception possibly due to cache backend.")
return f(*args, **kwargs)
if not found:
rv = f(*args, **kwargs)
if response_filter is None or response_filter(rv):
try:
self.cache.set(
cache_key,
rv,
timeout=decorated_function.cache_timeout,
)
decorated_function.expired_at[cache_key] = time.time() + timeout
except Exception:
if self.app.debug:
raise
logger.exception("Exception possibly due to cache backend.")
decorated_function.in_progress[cache_key] = False
return rv
def default_make_cache_key(*args, **kwargs):
argspec_args = inspect.getfullargspec(f).args
for arg_name, arg in zip(argspec_args, args):
kwargs[arg_name] = arg
return _make_cache_key(args, kwargs, use_request=False)
def _make_cache_key_query_string():
args_as_sorted_tuple = tuple(
sorted(pair for pair in request.args.items(multi=True))
)
args_as_bytes = str(args_as_sorted_tuple).encode()
cache_hash = hash_method(args_as_bytes)
if source_check and callable(f):
func_source_code = inspect.getsource(f)
cache_hash.update(func_source_code.encode("utf-8"))
cache_hash = str(cache_hash.hexdigest())
cache_key = request.path + cache_hash
return cache_key
def _make_cache_key(args, kwargs, use_request):
if query_string:
return _make_cache_key_query_string()
else:
if callable(key_prefix):
cache_key = key_prefix()
elif "%s" in key_prefix:
if use_request:
cache_key = key_prefix % request.path
else:
cache_key = key_prefix % url_for(f.__name__, **kwargs)
else:
cache_key = key_prefix
if source_check and callable(f):
func_source_code = inspect.getsource(f)
func_source_hash = hash_method(func_source_code.encode("utf-8"))
func_source_hash = str(func_source_hash.hexdigest())
cache_key += func_source_hash
return cache_key
decorated_function.uncached = f
decorated_function.cache_timeout = timeout
decorated_function.make_cache_key = default_make_cache_key
decorated_function.expired_at = defaultdict(float) # time.time() + timeout
decorated_function.in_progress = defaultdict(lambda: False) # check whether the renewal is in progress
return decorated_function
# Replace None or negative timeout to default_timeout
if not timeout or timeout < 0:
timeout = self.cache.default_timeout
# Initialize renewal starting point
renewal_time_left_thres = timeout * renewal_start if renewal_start else 0
return decorator
Or you can see the modified version of cached
and memoized
from here
Performance
To measure the improvement, I had built a uWSGI + Flask server in a docker container, configured two endpoints decorated with cache.cached
. Both endpoints sleep 3 seconds and return an empty string, but one uses the original Flask-Caching cache object, and the other uses the Probabilistic Renewal cache object.
I used uWSGI with 16 processes by 4 threads, Redis as a backend for cache storage, and heuristically set parameters of the cached
method; timeout
as 60 seconds, renewal_prob
as 0.0001 (0.01%), and renewal_start
as 0.3 (30% of timeout
)
I sent 1,000,000 parallel requests to the endpoints respectively and counted the number of cache miss.
With the original cache object, all requests were handled for 16 minutes and 43 seconds. The cache miss occurred 960 times (excluding the misses at the initial start-up).
When it comes to the customized one, they were handled for 17 minutes and 5 seconds. The renewal happened 65 times and no cache miss occurred.
Although request handling time was increased, Probabilistic Renewal Cache had no cache miss. Even considering the actual function calls, it reduced the number of execution of the original function by more than 90%. Moreover, when I replaced decorated_function.expired_at
and decorated_function.in_progress
with values in a cache (ex. {cache_key}:exp_at
) to sync them across the processes, I got 97.71% of decreased number of execution.
Of course, it depends on the machine it runs, the function to be cached, and other factors. However, our cached
method succeeds in reducing the thundering herd problem.
Drawback
However, there are some limitations and tradeoffs.
If the server is configured to use volatile cache storage and when it restarts, it will always execute the original function to handle all requests at that moment because there is no cached data at all.
Also if servers are not configured to use Remote Cache, and for this reason, the cached data is not shared across the servers, each server will have its own cache storage and there would be more cache miss and renewal requests.
There would be a memory leak because we use two dictionaries to store expiration timestamps and in-progress flags of each cache_key
. But, it can be resolved with the uWSGI max-requests option that makes WSGI workers (processes) reload themselves after each of them serves the specified amount of requests respectively, making the leaked memory released.
And you should heuristically tune the renewal_prob
and renewal_start
variables depending on the traffic pattern, function logic, and so on.
Conclusion
I showed you the code-level workaround for the Thundering Herd problem. It is tricky to tune the parameters, but it will make your server more reliable.
If it seems to be slowing down periodically, you may suspect this problem and can fix it with the Probabilistic Renewal Cache.