Monday, November 30, 2009

Finding what you're looking for by looking for everything else

At work, one of my responsibilities is to maintain a complicated object cache. Caches are a great mechanism for improving the performance of searches. However, they don't work well for improving the performance of searches for non-existent items; in fact, they often worsen such performance because the code must first search the cache, fail to find the object, then must search the underlying store, and fail to find the object again. In this case, the cache has added overhead and contention without adding benefit.

In practice, we try to minimize the situations where clients search for objects that don't exist, but such situations still arise, so, every six months or so, somebody tries to figure out a way to improve the cache so that it can cache items that don't exist, a discussion that usually ends quietly once my colleague Tom points out the obvious impossibility of enumerating the infinite space of objects that don't exist.

At any rate, I was reminded (a bit) of this by this cute post on The Daily WTF, in which the programmer makes the basic mistake of trying to enumerate all the unwanted data, rather than simply specifying the valid data.

Of course, not only is this technique a poor performer, but, perhaps more importantly, it is a classic source of security bugs, since the bad guys can always think of an extension that you left off your list. So, in this case as in so many others, the simplest code is not only easiest to understand, and best performing, but it is the most secure as well.

No comments:

Post a Comment