The following exercise comes from p. 234 of Ierusalimschy's Programming in Lua (4th edition). (NB: Earlier in the book, the author explicitly rejects the word memoization, and insists on using memorization instead. Keep this in mind as you read the excerpt below.)
Exercise 23.3: Imagine you have to implement a memorizing table for a function from strings to strings. Making the table weak will not do the removal of entries, because weak tables do not consider strings as collectable objects. How can you implement memorization in that case?
I am stumped!
Part of my problem is that I have not been able to devise a way to bring about the (garbage) collection of a string.
In contrast, with a table, I can equip it with finalizer that will report when the table is about to be collected. Is there a way to confirm that a given string (and only that string) has been garbage-collected?
Another difficulty is simply figuring out what the desired function's specification is. The best I can do is to figure out what it isn't. Earlier in the book (p. 225), the author gave the following example of a "memorizing" function:
Imagine a generic server that takes requests in the form of strings with Lua code. Each time it gets a request, it runs
load
on the string, and then calls the resulting function. However,load
is an expensive function, and some commands to the server may be quite frequent. Instead of callingload
repeatedly each time it receives a common command like"closeconnection()"
, the server can memorize the results fromload
using an auxiliary table. Before callingload
, the server checks in the table whether the given string already has a translation. If it cannot find a match then (and only then) the server callsload
and stores the result into the table. We can pack this behavior in a new function:[standard memo(r)ized implementation omitted; see variant using a weak-value table below]
The savings with this scheme can be huge. However, it may also cause unsuspected waste. ALthough some commands epeat over and over, many other commands happen only once. Gradually, the ["memorizing"] table
results
accumulates all commands the server has ever received plus their respective codes; after enough time, this behavior will exhaust the server's memory.A weak table provides a simple solution to this problem. If the
results
table has weak values, each garbage-collection cycle will remove all translations not in use at that moment (which means virtually all of them)1:
local results = {}
setmetatable(results, {__mode = "v"}) -- make values weak
function mem_loadstring (s)
local res = results[s]
if res == nil then -- results not available?
res = assert(load(s)) -- compute new results
result[s] = res -- save for later reuse
end
return res
end
As the original problem statement notes, this scheme won't work when the function to be memo(r)ized returns strings, because the garbage collector does not treat strings as "collectable".
Of course, if one is allowed to change the desired function's interface so that instead of returning a string, it returns a singleton table whose sole item is the real result string, then the problem becomes almost trivial, but I find it hard to believe that the author had such a crude "solution" in mind2.
In case it matters, I am using Lua 5.3.
1 As an aside, if the rationale for memo(r)ization is to avoid invoking load
more often than necessary, the scheme proposed by the author does not make sense to me. It seems to me that this scheme is based on the assumption (a heuristic, really) that a translation that is used frequently, and thus would pay to memo(r)ize, is also one that is always reachable (and hence not collectable). I don't see why this should necessarily, or even likely, be the case.
2 One may be able to put lipstick on this pig in the form of a __tostring
method that would allow the table (the one returned by the memo(r)ized function) to masquerade as a string in certain contexts; it's still a pig, though.
Your idea is correct: wrap string into a table (because table is collectable).
function memormoize (func_from_string_to_string)
local cached = {}
setmetatable(cached, {__mode = "v"})
return
function(s)
local c = cached[s] or {func_from_string_to_string(s)}
cached[s] = c
return c[1]
end
end
And I see no pigs in this solution :-)
one that is always reachable (and hence not collectable). I don't see why this should necessarily, or even likely, be the case.
There will be no "always reachable" items in a weak table.
But most frequent items will be recalculated only once per GC cycle.
The ideal solution (never collect frequently used items) would require more complex implementation.
For example, you can move items from normal cache to weak cache when item's "inactivity timer" reaches some threshold.