Search code examples
haskellfile-read

Reading first line of each file getting aborted at binary files


I am trying to read first line of each file in current directory:

import System.IO(IOMode(ReadMode), withFile, hGetLine)
import System.Directory (getDirectoryContents, doesFileExist, getFileSize)
import System.FilePath ((</>))
import Control.Monad(filterM)

readFirstLine :: FilePath -> IO String
readFirstLine fp = withFile fp ReadMode System.IO.hGetLine

getAbsoluteDirContents :: String -> IO [FilePath]
getAbsoluteDirContents dir = do
    contents <- getDirectoryContents dir
    return $ map (dir </>) contents

main :: IO ()
main = do
    -- get a list of all files & dirs
    contents <- getAbsoluteDirContents "."
    -- filter out dirs
    files <- filterM doesFileExist contents
    -- read first line of each file
    d <- mapM readFirstLine files
    print d

It is compiling and running but getting aborted with following error at a binary file:

mysrcfile: ./aBinaryFile: hGetLine: invalid argument (invalid byte sequence)

I want to detect and avoid such files and go on to next file.


Solution

  • A binary file is a file that contains byte sequences that can not be decoded to a valid string. But a binary file is not different from a text file if you do not inspect its content.

    It might be better to use an "It's Easier to Ask Forgiveness than Permission (EAFP)" approach: we try to read the first line, and if that fails, we ignore the output.

    import Control.Exception(catch, IOException)
    import System.IO(IOMode(ReadMode), withFile, hGetLine)
    
    readFirstLine :: FilePath -> IO (Maybe String)
    readFirstLine fp = withFile fp ReadMode $
        \h -> (catch (fmap Just (hGetLine h))
            ((const :: a -> IOException -> a) (return Nothing)))
    

    For a FilePath this returns an IO (Maybe String). If we run the IO (Maybe String), it will return a Just x with x the first line if it can read such file, and Nothing if an IOException was encoutered.

    We can then make use of catMaybes :: [Maybe a] -> [a] to obtain the Just xs:

    import Data.Maybe(catMaybes)
    
    main :: IO ()
    main = do
        -- get a list of all files & dirs
        contents <- getAbsoluteDirContents "."
        -- filter out dirs
        files <- filterM doesFileExist contents
        -- read first line of each file
        d <- mapM readFirstLine files
        print (catMaybes d)

    or you can make use of mapMaybeM :: Monad m => (a -> m (Maybe b)) -> [a] -> m [b] in the extra package [Hackage] that will automate that work for you.