Monday, October 05, 2015

C# Memoization

Memoization is a program optimisation technique that stores the results of a computationally expensive function in a cache and then returns values from that cache if the function is called again with the same input(s).

Wikipedia has the term coined by (a personal hero) Donald Michie in 1968 although I can’t find the word in the original paper (dated 1967) [link: https://saltworks.stanford.edu/assets/kr445kk4694.pdf ] so it probably came into being as the idea was applied.

I had previously used the technique in JavaScript but that was a while ago and it took me a few moments to even remember the precise term – let alone work out how to do the trick in C#.

The archetypal example of memoization in JavaScript is a function to calculate Fibonacci numbers and I found this one on http://stackoverflow.com/

var fibonacci = (function () {     var memo = [0, 1];     var fib = function (n) {         var result = memo[n];         if (typeof result !== 'number') {             result = fib(n - 1) + fib(n - 2);             memo[n] = result;         }         return result;     };     return fib; }());


The var fibonacci contains a function that can be used to calculate a result. It will return a cached value for any previously requested calculation and will recursively call itself to calculate a new value – which will often be returned from the cache (well certainly if a sequence of values are requested). 

The cache takes the form of an array (memo[]) and this is wrapped in a “closure” so that it remains accessible to the inner function fib(). 

Closure, recursion and memorization in one function – has all the dangerous signs of an interview question.

I wanted a function in C# to look-up a subset of values previously stored in a database (in this instance identity integers) with a high likelihood that individual values would be repeatedly requested over the run of the process but without being able to sort the requests in any meaningful way. So assuming that database reads are relatively slow, memorization looked like the technique to try.

My requirement is a simple case – I have a function into which I pass a string as a key and the function does a database lookup and retunes a long integer. So let us start there

private long lookUpDimension(string dkey) {     long idVal = 0;     try     {         // code to access the databse and do the lookup     }     return idVal; }

Just a simple function (well after all the database access code is removed for clarity). Point being – it is just an ordinary function – takes a single argument and returns the output.

The class containing that look-up function also has two static functions to work some magic.

private static Func<TArg, TResult> Memoize<TArg, TResult>(Func<TArg, TResult> function) {     return Memoize(function, new Dictionary<TArg, TResult>()); } private static Func<TArg, TResult> Memoize<TArg, TResult>(Func<TArg, TResult> function, IDictionary<TArg, TResult> cache) {     return key => cache.ContainsKey(key) ? cache[key] : (cache[key] = function(key)); }

So – what is this Func<TArg, TResult>  stuff?

Func is a predefined Delegate, this one takes a single argument in and outputs a single result. There are more defined (up to 8 inputs) and we could express the second one (2 inputs) thus:

public delegate TResult Func<in T1, in T2, out TResult>(
        T1 arg1,
        T2 arg2
)
But Func<T1, T2, TResult> is syntactic sugar of the right sort.

My code needed to declare something like this

private Func<string, long> memLookup = null;

which declares a delegate instance with a string input and a long integer output
and then the following code needs to be executed:


memLookup = Memoize<string, long>(lookUpDimension);

sometime when the class is being created perhaps or certainly before the memorized function is required. The class then exposes a method like:


public long LookupDimValue(string dkey) {     return memLookup(dkey); }

Because memLookup() now “contains” a memoized version of the original lookUpDimension() function taking a string input and returning a long integer.

One note of caution about the sort of memorized function I have used in this example. If the function fails to locate a database record corresponding to the given key then it returns a value indicating that (a zero). One option might be to then insert a new record but the memorized function will “remember” that the key is not valid and continue to return the original response. The insert would therefore have to be managed from within the memorized function and the relevant value for the new record returned.

Just a thought (added 11/10/15)
What is the simplest C# code to match the JavaScript function we started with?

Dictionary<int, long> memo = new Dictionary<int, long>() { { 0, 0 }, { 1, 1 } };

private long fib(int n) {     long result;     if (!memo.TryGetValue(n, out result))     {         result = fib(n - 1) + fib(n - 2);         memo.Add(n, result);     }     return result; }
Fewer lines but less "functional" as it makes use of a Dictionary object that is external to the function. Memoization would often be a better approach.

If you are wondering if memoization (at least in some form) is an effective technique then try writing a fibonacci function without and give it a run - as the input n increases the response time can be considerable.

And more (added 13/10/2015)

C#6 now allows a different form for initialising a Dictionary

Dictionary<int, long> memo = new Dictionary<int, long> {     [0] = 0,     [1] = 1 };

which is bound to be useful when the assignment can add value (just a thought).

No comments: