Search code examples
multithreadinghaskellmemory-managementhaskell-snap-framework

"Holding" a Data Map in memory


I have three data structures defined as such, where S, LL, M, and Object, represent Set, ListLike, Map, and ByteString, respectively:

nouns :: IO [Object]
nouns = liftM LL.words $ B.readFile "nounlist.txt"

obj :: IO ObjectSet
obj =  liftM S.fromList nouns

actions :: IO ActionMap
actions = do
  n <- nouns
  let l = foldl' (\z x -> (x,Sell):(x,Create):z) [] n
  return $ M.fromList $
    (\(x,y) -> ((x, Verb y []), Out (Verb y []) x)) <$> l

Now I have one function that binds the unevaluated Set and Map to variables a and o. Once it enters query, an infinite loop of queries are accepted via user-input and processed. Appropriate responses are generated via lookups.

process :: IO ()
process = do
  a <- actions
  o <- obj
  forever $ query "" a o

Keeping in mind that my Map is composed of 300,000+ key-value pairs: The initial temporal overhead of the first evaluation when the first query is called is between approximately 3-5 seconds, on my computer; this is fine and completely expected. Every other subsequent call is snappy and responsive, just the way I want it. However, this is only so because I am running this code as a standalone executable and have the luxury of staying within the IO () of process. If I were to turn this code (and the rest of the accompanying code not listed) into a library to interface with say .. a Snap Framework Web Application, I wouldn't necessarily have this luxury. Essentially what I am trying to say is: If I were to remove the forever from process then the evaluated Map and Set would surely get garbage-collected. Indeed this is what happens when I call the function from a Snap Application (I can't keep forever because it will block the Snap Application). Every subsequent call from the Snap Application will have the same 3-5 second overhead because it re-evaluates the data structures in question.


My Question:

Is there an easy way to hold the Map and Set in memory so that every subsequent lookup is fast? One Idea I came up with was to run a thread that sleeps and maintains storage for the Map and Set. However, this definitely seems like overkill to me. What am I overlooking? Thank you for bearing with my long-winded explanation.

Note: I'm not necessarily looking for code answers, moreso suggestions, advice, etc.


Solution

  • You can evaluate obj and actions only once during snaplet initialization and store result in snaplet's state.

    data SnapApp = SnapApp
        { objectSet :: ObjectSet
        , actionMap :: ActionMap
        }
    
    appInit :: SnapletInit SnapApp SnapApp
    appInit = makeSnaplet ... $ do
        ... 
        a <- liftIO actions
        o <- liftIO obj
        return $ SnapApp o a
    

    Now you can access them from snap's Handler:

    someUrlHandler :: Handler SnapApp SnapApp
    someUrlHandler = do
      a <- gets actionMap
      o <- gets objectMap
      res <- query a o
      ...
    

    This guarantees that actions and obj will be evaluated only once.