Cache, Query, or Calculate




Memoization, tabulation, even dynamic programming could be considered a kind of local memory caching.  A nice front end JS pattern is the interceptor in Angular for caching.  I use caching generously on the server side in most APIs I write.  Most ORMs have it built in and even offer second level caching.  Out on the web CDNs and mirror sites are caches.  In fact caching has even more uses than it has names.

Caches are useful in many places.  Real world application bottlenecks are almost always around I/O.  You can optimize your algorithm till the cows come home but if you haven't first looked at your I/O, you're improvements will likely be a drop in the ocean.  Often a quick and easy way to alleviate I/O bottlenecks is to slap on an appropriate cache.

A particular problem where we see caching is in the classic Fibonacci problem.  The Fibonacci sequence is a pattern frequently occurring in nature, but in Computer Science we see it most often in job interviews.  Fibonacci's a great question because it's a simple problem that clearly demonstrates recursion, its trade-offs with an iterative approach, and the benefits of caching.

Recursion, while often more elegant has the risk of blowing the stack.  In problems like the Fibonacci generator memoization can help us address this risk.

Iterative:
def f(n):
    a, b = 0, 1
    for i in range(0, n):
        a, b = b, a + b
    return a
Recursive w/ memoization:
@memoized def fibonacci(n): if n in (0, 1): return n return fibonacci(n-1) + fibonacci(n-2)
What's more interesting about the problem to me personally is that it also serves as a potential illustration for general algorithm design.  While the above implementations are succinct and efficient, there's an O(1) way to solve this problem:


Fn=(1+5)n(15)n2n5
def fib_equation(n):
    a = (1 + (5 ** 0.5)) ** n
    b = (1 - (5 ** 0.5)) ** n
    c = (2 ** n) * (5 ** 0.5)
    # round to remove float errors
    return round((a - b) / c)

So just like optimizing our algorithm before checking our I/O, we've applied an academic rather than practical approach and reached a sub-optimal solution.

Generally speaking, when optimizing I try to:
  1. Address I/O until it's optimal or no longer the bottleneck.  Often the first places to make practical improvements are in I/O caching and/or Query Optimization.
  2. Check the algorithm.
    1. Is it a solved problem with a known solution?  Use that.
    2. Profile and prune the algorithm.  There's generally some low hanging fruit on the first few iterations of optimization.
  3. If you've made it this far, go back to #1 and repeat.  You're certain to have a new bottleneck.  Themes from operations research apply equally well to software efficiency as to machine assembly lines.  A simplified recapitulation could be stated as, "Identify and focus on your bottleneck.  Fix it.  Iterate."  In fact the Big O notation is based on the same premise.  The bottleneck dominates runtime so regardless of the minutia, we measure overall performance by the most expensive operation.

Just for fun:




Comments

  1. Blockchain is now disrupting real estate along with the other industries. I'd highly recommend checking out this list if you're looking for a blockchain real estate 2018.

    ReplyDelete
  2. Heh, I wrote a very similar post myself a few years ago. How someone codes fibonacci can be very instructive.

    But iterating on #6 . . .

    The term you label as `b` approaches zero, so it's irrelevant to the equation. ;)

    ReplyDelete

Post a Comment

Popular posts from this blog

Engineering Truisms

The Telescoping Constructor (Anti-Pattern)

Software Capex: The Cost of Flexibility