Search code examples
stringhaskellio

Haskell remove trailing and leading whitespace from a file


Let's say I have a file

mary had a little lamb
         It's fleece was white as snow    
Everywhere   
    the child went   
   The lamb, the lamb was sure to go, yeah    

How would I read the file as a string, and remove the trailing and leading whitespace? It could be spaces or tabs. It would print like this after removing whitespace:

mary had a little lamb
It's fleece was white as snow
Everywhere
the child went
The lamb, the lamb was sure to go, yeah

Here's what I have currently:

import Data.Text as T

readTheFile = do
    handle <- openFile "mary.txt" ReadMode
    contents <- hGetContents handle
    putStrLn contents
    hClose handle
    return(contents)

main :: IO ()
main = do
    file <- readTheFile
    file2 <- (T.strip file)
    return()

Solution

  • Your code suggests a few misunderstandings about Haskell so let's go through your code before getting to the solution.

    import Data.Text as T
    

    You're using Text, great! I suggest you also use the IO operations that read and write Text types instead of what is provided by the prelude which works on Strings (linked lists of characters). That is, import Data.Text.IO as T

    readTheFile = do
        handle <- openFile "mary.txt" ReadMode
        contents <- hGetContents handle
        putStrLn contents
        hClose handle
        return(contents)
    

    Oh, hey, the use of hGetContents and manually opening and closing a file can be error prone. Consider readTheFile = T.readFile "mary.txt".

    main :: IO ()
    main = do
        file <- readTheFile
        file2 <- (T.strip file)
        return()
    

    Two issues here.

    Issue one Notice here you have used strip as though it's an IO action... but it isn't. I suggest you learn more about IO and binding (do notation) vs let-bound variables. strip computes a new value of type Text and presumably you want to do something useful with that value, like write it.

    Issue two Stripping the whole file is different than stripping each line one at a time. I suggest you read mathk's answer.

    So in the end I think you want:

    -- Qualified imports are accessed via `T.someSymbol`
    import qualified Data.Text.IO as T
    import qualified Data.Text as T
    
    -- Not really need as a separate function unless you want to also
    -- put the stripping here too.
    readTheFile :: IO T.Text
    readTheFile = T.readFile "mary.txt"
    
    -- First read, then strip each line, then write a new file.
    main :: IO ()
    main =
        do file <- readTheFile
           let strippedFile = T.unlines $ map T.strip $ T.lines file
           T.writeFile "newfile.txt" (T.strip strippedFile)