I have a Scotty/WAI application and one of the endpoints sends a large Text
output built from a list of elements. Here is the relevant code:
import Data.Text.Lazy as L
import Data.Text.Lazy.Encoding as E
class (Show csv) => ToCSV csv where
toCSV :: csv -> L.Text
toCSV = pack . show
instance (ToCSV c) => ToCSV [c] where
toCSV [] = empty
toCSV (c:cs) = toCSV c <> "\n" <> toCSV cs
get "/api/transactions" $ accept "text/csv" $ do
purp <- selectPurpose
txs <- allEntries <$> inWeb (listTransactions purp)
setHeader "Content-Type" "text/csv"
raw $ E.encodeUtf8 $ toCSV txs
As I understand Scotty's documentation the output should be lazily built and sent over the wire without the need to build the whole text/bytestring in memory. However this is not the behaviour that I observe: when I call this endpoint the server starts to eat up memory and I infer it is building the whole string, before sending it in one go.
Am I missing something?
Edit 1:
I have written a doStream
function that's supposed to send chunks of resulting BS one by one:
doStream :: Text -> W.StreamingBody
doStream t build flush = do
let bs = E.encodeUtf8 t
mapM_ (\ chunk -> build (B.fromByteString chunk)) (BS.toChunks bs)
flush
but actually it still builds the whole output in memory...
Edit 2:
Actually, streaming this way works fine. The server process still eats up a lot of memory though, which might be actually garbage collectible upon sending each chunk. I will try to analyze memory usage more deeply to see where this consumption comes from.
Edit 3:
I tried to limit heap to 2GB but this makes the process crash. Some memory is retained during the whole transformation process...
Take a look at the "stream" function in Web.Scotty.Trans. It is made for the purpose of having finer grained control over the size of the data that is generated before it is flushed to the socket.
You call it with a StreamingBody argument, which is in fact a function of the type (Builder -> IO ()) -> IO () -> IO ().
So you write a function:
doMyStreaming send flush =
...
in which you send and flush your data in pieces, and then call the stream function with doMyStreaming as argument instead of the call to "raw".