Search code examples
mongodbhaskellscotty

Haskell database connections


Please look at this scotty app (it's taken directly from this old answer from 2014):

import Web.Scotty
import Database.MongoDB
import qualified Data.Text.Lazy as T
import Control.Monad.IO.Class

runQuery :: Pipe -> Query -> IO [Document]
runQuery pipe query = access pipe master "nutrition" (find query >>= rest) 

main = do
  pipe <- connect $ host "127.0.0.1"
  scotty 3000 $ do
    get "/" $ do
      res <- liftIO $ runQuery pipe (select [] "stock_foods")
      text $ T.pack $ show res

You see how the the database connection (pipe) is created only once when the web app launches. Subsequently, thousands if not millions of visitors will hit the "/" route simultaneously and read from the database using the same connection (pipe).

I have questions about how to properly use Database.MongoDB:

  1. Is this the proper way of setting things up? As opposed to creating a database connection for every visit to "/". In this latter case, we could have millions of connections at once. Is that discouraged? What are the advantages and drawbacks of such an approach?
  2. In the app above, what happens if the database connection is lost for some reason and needs to be created again? How would you recover from that?
  3. What about authentication with the auth function? Should the auth function only be called once after creating the pipe, or should it be called on every hit to "/"?
  4. Some say that I'm supposed to use a pool (Data.Pool). It looks like that would only help limit the number of visitors using the same database connection simultaneously. But why would I want to do that? Doesn't the MongoDB connection have a built-in support for simultaneous usages?

Solution

    1. Even if you create connection per client you won't be able to create too many of them. You will hit ulimit. Once you hit that ulimit the client that hit this ulimit will get a runtime error. The reason it doesn't make sense is because mongodb server will be spending too much time polling all those connections and it will have only as many meaningful workers as many CPUs your db server has. One connection is not a bad idea, because mongodb is designed to send several requests and wait for responses. So, it will utilize as much resources as your mongodb can have with only one limitation - you have only one pipe for writing, and if it closes accidentally you will need to recreate this pipe yourself. So, it makes more sense to have a pool of connections. It doesn't need to be big. I had an app which authenticates users and gives them tokens. With 2500 concurrent users per second it only had 3-4 concurrent connections to the database.

    Here are the benefits connection pool gives you:

    • If you hit pool connection limit you will be waiting for the next available connection and will not get runtime error. So, you app will wait a little bit instead of rejecting your client.

    • Pool will be recreating connections for you. You can configure pool to close excess of connections and create more up until certain limit as you need them. If you connection breaks while you read from it or write to it, then you just take another connection from the pool. If you don't return that broken connection to the pool pool will create another connection for you.

      1. If the database connection is closed then: mongodb listener on this connection will exit printing a error message on your terminal, your app will receive an IO error. In order to handle this error you will need to create another connection and try again. When it comes to handling this situation you understand that it's easier to use a db pool. Because eventually you solution to this will resemble connection pool very much.

      2. I do auth once as part of opening a connection. If you need to auth another user later you can always do it.

      3. Yes, mongodb handles simultaneous usage, but like I said it gives only one pipe to write and it soon becomes a bottle neck. If you create at least as many connections as your mongodb server can afford threads for handling them(CPU count), then they will be going at full speed.

    If I missed something feel free to ask for clarifications. Thank you for your question.