How I Reduced the Thundering Herd

Flask-Caching with Probabilistic Renewal

A cache stores various types of data so that the data can be served faster rather than retrieved from the distant source or recomputed every time it is required.

You may cache, for example, resource-intensive functions, endpoint functions or anything else to avoid repeated computation and to improve the responsiveness of your web page.

As a result, pages can be served to the users more faster, and the server can stop overusing resources and operate reliably.


A web server should handle multiple requests simultaneously. When a user requests a page, there are also other users requesting the same or another pages. In this situation, caches might not work as expected.
Let me explain my experience.

One of my jobs at my previous company was to develop a social community website. During Korean National College Entrance Exam period, almost 10,000 posts are uploaded there a day and even more comments than that a day.


It has a small section visible on all pages that shows the top 5 most commented posts. All users request that section every time he/she visits, and that means, the function computing the rank should be executed on almost every request.

Fortunately, because a little delay in reflecting the rank is acceptable, and unfortunately because computing it involves heavy database query, there is a cache for computing the rank with 10 minutes of TTL.

When the time comes, the cache expires.
The first request after the expiration will encounter a cache miss, so it will call the computing function and will cache the returned value, the rank.


But what if it takes 10 seconds to finish the function and to populate the cached value? Let's see what happens.
The first request of a user calls the function to compute the rank. At the same time, the second request of another user also does. Until any request so far finishes the function and succeeds in caching, the third, fourth and others do in the same way.

Consequently, even though we have cached the function and expected it to be executed only once whenever expiration occurs, it is executed multiple times by different requests at the same moment. It is called a "Thundering Herd" problem.

In this situation, requests unexpectedly bypassing the cache eventually make database load increase. As the load increases, the response time from the database may increase, and the more hanging requests, the more chances of the server outage.

Actually, during the exam season, my sole database had been overwhelmed by the heavy query load whenever the cache expires, and eventually there were periodical outages until the traffic became stabilized.

Workaround for this

The caching methodology above is called Lazy Caching. By using Lazy Caching, when your server receives a request from a user, it checks whether the data is cached. If so, it uses the data right away, but if not, it computes and caches the value again and returns it to the user.

Alternatively, you can use a Write-through cache. When the original data having been cached is changed in the database, you also reflect the change to the cache. TTL is not required because the cached data is always up-to-date.

However, there are cases where it is difficult to implement. Especially my case was. So I decided to make a slight change to the caching package for a workaround.



Flask is a lightweight WSGI web application framework for python, and one of the most loved packages related to Flask is Flask-Caching.

I won't talk about what is Flask and how to use it, but will take a brief explanation about Flask-Caching (v1.10.1)


What you need to do is to create a Cache object and a decorator @cache.cached(...) for the endpoint function.

When you visit 'localhost:5000/'(default server URL), it shows the DateTime you visit (e.g. "Fri Jan 21 19:07:24 2022"), and when you re-visit before timeout, the server shows the same DateTime.

That means, your second request does not reach the endpoint function, but the cache returns the value previously computed by the first request.
That's all the cache does.

You also have to note that cache.cached(...) does not consider parameters passed to the function. For example,


The results of add function are the same although passed are different parameters.

In this case, cache.memoize(...) works.


Now add(3, 4) returns right value.

Probabilistic Renewal

As I said, unexpected requests can reach into the function at the moment the expiration occurs. I will show you the modified version of cached method for a workaround. You can modify the memoize method in the same way if necessary.

The original code is here. To put it simply, the cached method executes the following steps.

  1. If unless (callable argument) returns True, decorator returns f(*args, **kwargs) directly ignoring cache-related logic.
  2. The decorator function makes a cache key using the args and kwargs passed to the decorated function and request information (URL path, querystring, ...).
  3. If force_update (callable argument) is not None and returns True, it sets a cache miss flag on.
    Otherwise, it retrieves the cached data from the cache storage. In this case, as well, it sets the cache miss flag if there is nothing cached.
  4. When the cache miss flag is set, it executes the original function f and populates(sets) the returned data to the cache storage.
  5. Finally, it returns the cached data regardless of whether it was just populated or not.

Straightforward, right?

Now, what we want to do is to force the cached data to be updated conditionally before expiration occurs. It would be great if the update(renewal) occurs when the expiration is imminent.

First of all, we add two arguments to the cached method.


renewal_prob specifies in what percent chance of renewal could happen. 0.01 means 1%.
renewal_start is to calculate renewal_time_left_thres variable described below.
If at least one of them is set None, the renewal will not happen.

And we need to set two variables and two function attributes.


Although timeout could be None and will be replaced by default value at the appropriate time, we should replace it explicitly to calculate renewal_time_left_thres.

It is used to set the threshold of expiration time left when to start to try renewing. For example, assuming that it is 10 seconds and the expiration timeout is 11 seconds left, there is no chance of the renewal. When 10 seconds or less are left, the renewal could happen.


expired_at stores the expiration timestamp of each cache key.
in_progress is a flag to check whether the renewal is in progress by another thread.

Now, we will add more logic into decorated_function.


Using cache_key, we can get the expiration timestamp from decorated_function.expired_at and the in-progress status from decorated_function.in_progress.

We want to set 'forcing update' flags on when ...

  1. renewal_prob is not None
  2. expiration timestamp - time.time() is less than renewal_time_left_thres
  3. The renewal is not in progress
  4. random.random() is less than renewal_prob

The first condition is to maintain backward compatibility, and the second is intended to limit the period in which the renewal could occur. The third is to keep the renewal from being executed in more than one thread in a process at the same moment, and the last is necessary to decrease the chances of renewal when handling multiple requests.

When the renewal gets its turn, the lines in the if statement are executed. Consequently, the in-progress flag is set, rv is set as None, and found is False.


Because found is False, the original function, f, is called and the return value is populated into the cache storage. The expiration timestamp is recalculated, and the in-progress flag is set off.

Finally, the decorated_function returns the cached data (rv) as a result of calling f function.

Here is the code ※ I removed docstrings and comments for readability.
    :copyright: (c) 2010 by Thadeus Burgess.
    :license: BSD, see LICENSE for more details.

    See original code from here: https://github.com/pallets-eco/flask-caching/blob/348dbecf1365128799f99cb4d751f212089bb218/src/flask_caching/__init__.py

def cached(
    timeout: Optional[int] = None,
    key_prefix: str = "view/%s",
    unless: Optional[Callable] = None,
    forced_update: Optional[Callable] = None,
    response_filter: Optional[Callable] = None,
    query_string: bool = False,
    hash_method: Callable = hashlib.md5,
    cache_none: bool = False,
    make_cache_key: Optional[Callable] = None,
    source_check: Optional[bool] = None,
    renewal_prob: Optional[int] = None,
    renewal_start: Optional[int] = None,
) -> Callable:
    def decorator(f):
        def decorated_function(*args, **kwargs):
            #: Bypass the cache entirely.
            if self._bypass_cache(unless, f, *args, **kwargs):
                return f(*args, **kwargs)

            nonlocal source_check
            if source_check is None:
                source_check = self.source_check

            cache_key = None
                if make_cache_key is not None and callable(make_cache_key):
                    cache_key = make_cache_key(*args, **kwargs)
                    cache_key = _make_cache_key(args, kwargs, use_request=True)

                if (
                    renewal_prob is not None
                    and decorated_function.expired_at[cache_key] - time.time() < renewal_time_left_thres
                    and not decorated_function.in_progress[cache_key]
                    and random.random() < renewal_prob
                ) or (
                    and (
                        forced_update(*args, **kwargs)
                        if wants_args(forced_update)
                        else forced_update()
                    is True
                    decorated_function.in_progress[cache_key] = True
                    rv = None
                    found = False
                    rv = self.cache.get(cache_key)
                    found = True

                    if rv is None:
                        if not cache_none:
                            found = False
                            found = self.cache.has(cache_key)
            except Exception:
                if cache_key:
                    decorated_function.in_progress[cache_key] = False
                if self.app.debug:
                logger.exception("Exception possibly due to cache backend.")
                return f(*args, **kwargs)

            if not found:
                rv = f(*args, **kwargs)

                if response_filter is None or response_filter(rv):
                        decorated_function.expired_at[cache_key] = time.time() + timeout

                    except Exception:
                        if self.app.debug:
                        logger.exception("Exception possibly due to cache backend.")
                decorated_function.in_progress[cache_key] = False
            return rv

            def default_make_cache_key(*args, **kwargs):
                argspec_args = inspect.getfullargspec(f).args

                for arg_name, arg in zip(argspec_args, args):
                    kwargs[arg_name] = arg

                return _make_cache_key(args, kwargs, use_request=False)

            def _make_cache_key_query_string():
                args_as_sorted_tuple = tuple(
                    sorted(pair for pair in request.args.items(multi=True))

                args_as_bytes = str(args_as_sorted_tuple).encode()
                cache_hash = hash_method(args_as_bytes)

                if source_check and callable(f):
                    func_source_code = inspect.getsource(f)

                cache_hash = str(cache_hash.hexdigest())

                cache_key = request.path + cache_hash

                return cache_key

            def _make_cache_key(args, kwargs, use_request):
                if query_string:
                    return _make_cache_key_query_string()
                    if callable(key_prefix):
                        cache_key = key_prefix()
                    elif "%s" in key_prefix:
                        if use_request:
                            cache_key = key_prefix % request.path
                            cache_key = key_prefix % url_for(f.__name__, **kwargs)
                        cache_key = key_prefix

                if source_check and callable(f):
                    func_source_code = inspect.getsource(f)
                    func_source_hash = hash_method(func_source_code.encode("utf-8"))
                    func_source_hash = str(func_source_hash.hexdigest())

                    cache_key += func_source_hash

                return cache_key

            decorated_function.uncached = f
            decorated_function.cache_timeout = timeout
            decorated_function.make_cache_key = default_make_cache_key

            decorated_function.expired_at = defaultdict(float)  # time.time() + timeout
            decorated_function.in_progress = defaultdict(lambda: False)  # check whether the renewal is in progress

            return decorated_function

        # Replace None or negative timeout to default_timeout
        if not timeout or timeout < 0:
            timeout = self.cache.default_timeout

        # Initialize renewal starting point
        renewal_time_left_thres = timeout * renewal_start if renewal_start else 0

        return decorator

Or you can see the modified version of cached and memoized from here


To measure the improvement, I had built a uWSGI + Flask server in a docker container, configured two endpoints decorated with cache.cached. Both endpoints sleep 3 seconds and return an empty string, but one uses the original Flask-Caching cache object, and the other uses the Probabilistic Renewal cache object.

I used uWSGI with 16 processes by 4 threads, Redis as a backend for cache storage, and heuristically set parameters of the cached method; timeout as 60 seconds, renewal_prob as 0.0001 (0.01%), and renewal_start as 0.3 (30% of timeout)

I sent 1,000,000 parallel requests to the endpoints respectively and counted the number of cache miss.

With the original cache object, all requests were handled for 16 minutes and 43 seconds. The cache miss occurred 960 times (excluding the misses at the initial start-up).

When it comes to the customized one, they were handled for 17 minutes and 5 seconds. The renewal happened 65 times and no cache miss occurred.

Although request handling time was increased, Probabilistic Renewal Cache had no cache miss. Even considering the actual function calls, it reduced the number of execution of the original function by more than 90%. Moreover, when I replaced decorated_function.expired_at and decorated_function.in_progress with values in a cache (ex. {cache_key}:exp_at) to sync them across the processes, I got 97.71% of decreased number of execution.

Of course, it depends on the machine it runs, the function to be cached, and other factors. However, our cached method succeeds in reducing the thundering herd problem.


However, there are some limitations and tradeoffs.

If the server is configured to use volatile cache storage and when it restarts, it will always execute the original function to handle all requests at that moment because there is no cached data at all.

Also if servers are not configured to use Remote Cache, and for this reason, the cached data is not shared across the servers, each server will have its own cache storage and there would be more cache miss and renewal requests.

There would be a memory leak because we use two dictionaries to store expiration timestamps and in-progress flags of each cache_key. But, it can be resolved with the uWSGI max-requests option that makes WSGI workers (processes) reload themselves after each of them serves the specified amount of requests respectively, making the leaked memory released.

And you should heuristically tune the renewal_prob and renewal_start variables depending on the traffic pattern, function logic, and so on.


I showed you the code-level workaround for the Thundering Herd problem. It is tricky to tune the parameters, but it will make your server more reliable.

If it seems to be slowing down periodically, you may suspect this problem and can fix it with the Probabilistic Renewal Cache.


hit count