TTL caching with an expiring dict
An educational exercise
Background
In one of my projects at work I had to make many calls to various service endpoints. The calls happen in a short time frame. Changes that might occur in the responses during this period, can be ignored and the services seem like frozen in time in terms of behavior. Later changes though should be taken into account. Therefore caching forever (memoizing) is out of the question, values should “expire” after this time frame. Another thing that happens i that during this period the calls can happen on the same endpoints. Given the “frozen” in time assumption, this is wasteful. If we resorted to “clever” tricks, in order to reduce calls the complexity of the code would increase, because we would pollute it with “smartness”. Instead we leave the code as is and resort to one of the most successful and widespread aspect oriented techniques, object caching. To make thing interesting the calls are done sometimes in parallel, through threading. Instead of creating a library, I will just show you how this is accomplished with various techniques in your code. You can tailor it for your use case. Let’s crack on.
Reducing the problem
Here is a minimal case that actually reproduces my requirements.
The main idea is that two calls are done in parallel. We simulate the object retrieval by echoing the input. This may change after some short interval. But during that, the input is the same according to our requirements. The above naive implementation will echo twice. This is exactly what we need to avoid because the calls simulate calls to remote services, and the second call is wasteful since the service is “frozen in time” during this “short period”. We now have reduced our problem to a case that is more amenable to solution.
Baseline attempt
The baseline attempt was based on the DRY principle. We go straight to the famous cachetools. The gist that implements my approach follows.
Unfortunately it does not work as I expected. Threading issues. Please try it to see what i happening. “AAA” is not printed once. Sometimes it is printed once, sometimes it is printed twice. Mea culpa while reading the documentation. Please, take your time to understand the issue. Also I do not like this lock in the signature. We need to try something else. So, sleeves up!
First attempt on caching (naive)
The first attempt will use the excellent expiringdict library to implement the timely expiration of our cached values. We will protect everything with a lock.
Line 6 sets up the expiration. A hundred of values will be cached for at most 10 seconds. This is just an example and the values are selected only for display purposes. So, the caching is thread safe now. As I told you previously there can be various services that are called. This approach just implements a caching solution for a specific case. Given the aspect oriented nature we mentioned before we need to protect our different calls with the same aspect. And this is done in Python with the decorator pattern. But there is a serious drawback. The above caching is specific to the special function we use. We cache based on the assumption that our function gets only one argument and this is hashable. We will revisit this decision in the next section. There is no harm in generality if we cache the echoed value. After all, the return value can be whatever, just like the input.
Second attempt on caching (decorator)
In this approach we use a decorator to reuse our caching. This is a textbook application of the decorator pattern that you can see in the above link.
There is also a change on what we cache. We now make our function more generic. We accept a function with positional only arguments. Still not the general case. But we are not aiming at solving the general case. Let’s keep it simple for now. However our code has two serious problems. A performance one and a serious bug, one that is hard to debug.
- All service call pass through one and only lock, Line 15. (The performance problem)
- Different service calls with the same argument list, potentialy mix their results Line 16 because they use the same cach. (The dangerous bug)
The remedy to both problems is to create caches and locks per service call function. The problem of specifying the lock and the cache and passing the pair around can be solved typically through object orientation.
Third attempt on caching (object orientation)
The next gist outlines the approach. The interesting code is in object OurTTLCache. As promised a cache and a lock is created inside it. TTL means Time-To-Leave and comes from network protocol terminology.
Three are the notable additions here. The first as you can see, is the
def __call__(self, *args):
pass
which actually makes the object returned by the decorator a Callable. So, this one can be called instead of the original function. In decorator terminology, this is the “wrapper”. The other “suspiciously” looking thing is
def clear(self):
with self.lock:
self.cache = ExpiringDict(max_len=100, max_age_seconds=10)
Why do we need to “clear” an expirable cache? The most obvious case is to expire in an emergency situation. And what is an emergency situation? It can be application defined. But in our case the most intuitive emergency situation happens in a unit test. Before we run the unit test, have a closer look at (the textbook approach)
@wraps(f, updated=())
class OurTTLCache:
pass
The short answer is that the wrapper tries to make the object look similar to the original function. This fails because we are not returning a function. A bit longer answer is here and of course in the official Python documentation. Now lets have a look at the unit test
You can run the test and it will pass. Now comment out line 27. The second test fails because the behavior of the function is cached from the previous test. Caching sets up state. It has side effects, and we need to reset the “world”.
Testing our caching
Now it is time to test our caching. Since time is involved we will use the excellent freezegun library. Read carefully how this library works. But the highlight here is that we make a cache of 1 entry that expires in 2 seconds. We set up the time stepping in our test in a way that it expires. Line 10 actually expires it because the now() is called 3 times. Twice by us and one time internally.
In the codebase there is another test that checks that caching in terms of element size behaves as expected.
Wrap up
Having a real problem at hand and a failed attempt to solve it, is the beginning of an adventure. This is an dventure that helps you understand things, improve thing and write a Medium article. 😉. I hope you enjoyed my article. As usually feel free to play with the code in Github and let me know if something is broken.