Search code examples
luagarbage-collectionmemoizationweak-references

Garbage collection and the memoization of a string-to-string function


The following exercise comes from p. 234 of Ierusalimschy's Programming in Lua (4th edition). (NB: Earlier in the book, the author explicitly rejects the word memoization, and insists on using memorization instead. Keep this in mind as you read the excerpt below.)

Exercise 23.3: Imagine you have to implement a memorizing table for a function from strings to strings. Making the table weak will not do the removal of entries, because weak tables do not consider strings as collectable objects. How can you implement memorization in that case?

I am stumped!

Part of my problem is that I have not been able to devise a way to bring about the (garbage) collection of a string.

In contrast, with a table, I can equip it with finalizer that will report when the table is about to be collected. Is there a way to confirm that a given string (and only that string) has been garbage-collected?


Another difficulty is simply figuring out what the desired function's specification is. The best I can do is to figure out what it isn't. Earlier in the book (p. 225), the author gave the following example of a "memorizing" function:

Imagine a generic server that takes requests in the form of strings with Lua code. Each time it gets a request, it runs load on the string, and then calls the resulting function. However, load is an expensive function, and some commands to the server may be quite frequent. Instead of calling load repeatedly each time it receives a common command like "closeconnection()", the server can memorize the results from load using an auxiliary table. Before calling load, the server checks in the table whether the given string already has a translation. If it cannot find a match then (and only then) the server calls load and stores the result into the table. We can pack this behavior in a new function:

[standard memo(r)ized implementation omitted; see variant using a weak-value table below]

The savings with this scheme can be huge. However, it may also cause unsuspected waste. ALthough some commands epeat over and over, many other commands happen only once. Gradually, the ["memorizing"] table results accumulates all commands the server has ever received plus their respective codes; after enough time, this behavior will exhaust the server's memory.

A weak table provides a simple solution to this problem. If the results table has weak values, each garbage-collection cycle will remove all translations not in use at that moment (which means virtually all of them)1:

local results = {}
setmetatable(results, {__mode = "v"})  -- make values weak
function mem_loadstring (s)
  local res = results[s]
  if res == nil then                   -- results not available?
    res = assert(load(s))              -- compute new results
    result[s] = res                    -- save for later reuse
  end
  return res
end

As the original problem statement notes, this scheme won't work when the function to be memo(r)ized returns strings, because the garbage collector does not treat strings as "collectable".


Of course, if one is allowed to change the desired function's interface so that instead of returning a string, it returns a singleton table whose sole item is the real result string, then the problem becomes almost trivial, but I find it hard to believe that the author had such a crude "solution" in mind2.

In case it matters, I am using Lua 5.3.


1 As an aside, if the rationale for memo(r)ization is to avoid invoking load more often than necessary, the scheme proposed by the author does not make sense to me. It seems to me that this scheme is based on the assumption (a heuristic, really) that a translation that is used frequently, and thus would pay to memo(r)ize, is also one that is always reachable (and hence not collectable). I don't see why this should necessarily, or even likely, be the case.

2 One may be able to put lipstick on this pig in the form of a __tostring method that would allow the table (the one returned by the memo(r)ized function) to masquerade as a string in certain contexts; it's still a pig, though.


Solution

  • Your idea is correct: wrap string into a table (because table is collectable).

    function memormoize (func_from_string_to_string)
       local cached = {}
       setmetatable(cached, {__mode = "v"}) 
       return 
          function(s)
             local c = cached[s] or {func_from_string_to_string(s)} 
             cached[s] = c                    
             return c[1]
          end
    end
    

    And I see no pigs in this solution :-)

    one that is always reachable (and hence not collectable). I don't see why this should necessarily, or even likely, be the case.

    There will be no "always reachable" items in a weak table.
    But most frequent items will be recalculated only once per GC cycle.
    The ideal solution (never collect frequently used items) would require more complex implementation.
    For example, you can move items from normal cache to weak cache when item's "inactivity timer" reaches some threshold.