Function Memoization in C#
Pure functions, those that always return the same value for a given input, have several advantages over other functions. One of these is that their result can be saved or "memoized" so they do not need to be recalculated. Wes Dyer of the C# compiler team demonstrates a generic way to do this with C# 3.0 and closures.
A closure is essentially a function pointer packaged with a pointer to some data. Syntactically, a closure is usually created as an anonymous inner function that captures one or more of the outer function's local variables. Because of this capturing, the local variable becomes a heap-allocated variable that can be accessed from both the outer function and the delegate pointing to the closure. This allows the local variable to persist long after the function that contains it is gone.
Closures make it easy to do some otherwise tedious tasks quite easily. In Wes Dyer's example, he shows us how to use closures to memoize the results of a recursive function.
Func<int, int> fib = null;
fib = n => n > 1 ? fib(n - 1) + fib(n - 2) : n;
fib = fib.Memoize();
The first line of this example creates the variable to hold the function pointer and indicates that it should have one parameter of type integer and return an integer.
The next line creates the function itself. It is important to realize in this context a function is not just static code, it is an object that can be referenced as well. The syntax "n =>" is used to name the parameter, which is then used inside the function.
While the syntax is fairly clean, using recursion to calculate Fibonacci numbers is incredibly inefficient. Calling fib(5) will result in 14 functions calls, including two calls to fib(3), three to fib(2), and five calls to fib(1).
If the fib(n) function could somehow remember that it was already called with a given value of n, it wouldn't have to recalculate it each time. Traditionally this involves creating new class to hold the results of each call, but Wes shows an easier way. Just wrap the function with a call to the Memoize extension method.
The details of the Memoize method are shown below.
public static Func<A, R> Memoize<A, R>(this Func<A, R> f)
var map = new Dictionary<A, R>();
return a =>
if (map.TryGetValue(a, out value))
value = f(a);
The memoize method creates a variable called map, and then it wraps both the variable and the original function in a new function. This new function is what gets returned from the memoize method.
From that point on, every time fib is called this new function is run instead of the original. If the result cannot be found in the map variable, then the original is used.
But wait you say, isn't the original the one with such poor runtime performance. Well yes, but the original calls the fib function pointer, and that now points to the new version. Essentially the original code has been redefined in addition to being wrapped.
For more background on closures, you can read Steve Schafer's post in comp.lang.functional.
Also, it's "memoize" (look it up) not "memorize."
Re: Lost content
Well yes, but the original calls the fib function pointer, and that now points to the new version. Essentially the original code has been redefined in addition to being wrapped.
Maybe I'm just having a lapse, but the way I read this is wrong. The original function is passed into the Memoize extension-method and captured with the lambda, which is also a closure. The "reference" is never replaced. It doesn't matter what assignments occur outside of the Memoize method. So the old "pointer" always references the original function. The deal is that the result of that long-running operation is cached, so if it's called again, it can just be retrieved from the hashtable.
Memoization is a really powerful technique, but it's not really broadly applicable since the result of each call is stored for the life of the application unless you cache in an external store you can flush. It makes for a good example, but a Fibonacci function would actually probably be a pretty bad usage since the cache could become very large pretty quickly. A good use case might be during processing monthly files of credit-card transactions. You could memoize date-parsing knowing that the process will only ever have up to 31 possible dates, saving a lot of work.
It's also worth noting that Memoization can be whipped up in c#2 pretty quickly too. It's only a couple sugary syntax snippets here that make it c#3 specific. Get rid of the lambda and use an anonymous-method, get rid of the extension-method and use a static class method and you're done.