I'm writing a program which would take a list of text files as arguments and outputs a file in which each row is the intercalation of tabs between the corresponding rows in the files.
Assume all characters are ASCII encoded
import GHC.IO.Handle
import System.IO
import System.Environment
import Data.List
main = do
(out:files) <- getArgs
hs <- mapM (`openFile` ReadMode) files
txts <- mapM B.hGetContents hs
let final = map (B.intercalate (B.singleton '\t')) . transpose
. map (B.lines . B.filter (/= '\t')) $ txts
withFile out WriteMode $ \out ->
B.hPutStr out (B.unlines final)
putStrLn "Completed successfully"
The problem is that it outputs:
file1row1
file2row1
file1row2
file2row2
file1row3
file2row3
instead of:
file1row1 file2row1
file1row2 file2row2
file1row3 file2row3
The same logic works correctly when tested by manually defining the functions in ghci. And the same code works correctly when using Data.Text.Lazy
instead of lazy Bytestring
s.
What's wrong with my approach?
There is a known bug in Data.ByteString.Lazy.UTF8 where newline conversion doesn't take place properly, even though the documentation says that it should. (See Data.ByteString.Lazy.Char8 newline conversion on Windows---is the documentation misleading?) This could be the cause of your problem.