I am trying to read text of all files in a folder with following code:
readALine :: FilePath -> IO ()
readALine fname = do
putStr . show $ "Filename: " ++ fname ++ "; "
fs <- getFileSize fname
if fs > 0 then do
hand <- openFile fname ReadMode
fline <- hGetLine hand
hClose hand
print $ "First line: " <> fline
else return ()
However, some of these files are binary. How can I find if a given file is binary? I could not find any such function in https://hoogle.haskell.org/?hoogle=binary%20file
Thanks for your help.
Edit: By binary I mean the file has unprintable characters. I am not sure of proper term for these files.
I installed UTF8-string and modified the code:
readALine :: FilePath -> IO ()
readALine fname = do
putStr . show $ "Filename: " ++ fname ++ "; "
fs <- getFileSize fname
if fs > 0 then do
hand <- openFile fname ReadMode
fline <- hGetLine hand
hClose hand
if isUTF8Encoded (unpack fline) then do
print $ "Not binary file."
print $ "First line: " <> fline
else return ()
else return ()
Now it works but on encountering a 'binary' executable file (called esync.x), there is error at hGetLine hand
expression:
"Filename: ./esync.x; "firstline2.hs: ./esync.x: hGetLine: invalid argument (invalid byte sequence)
How can I check about characters from file handle itself?
The definition of binary is quite vague, but assuming you mean content which is not valid UTF-8 text.
You should use toString
in Data.ByteString.UTF8
which replaces non-UTF-8 characters with a replacement character but doesn't fail with an error.
Converting your example to use UTF-8 ByteStrings:
import Data.Monoid
import System.IO
import System.Directory
import qualified Data.ByteString as B
import qualified Data.ByteString.UTF8 as B
readALine :: FilePath -> IO ()
readALine fname = do
putStr . show $ "Filename: " ++ fname ++ "; "
fs <- getFileSize fname
if fs > 0 then do
hand <- openFile fname ReadMode
fline <- B.hGetLine hand
hClose hand
print $ "First line: " <> B.toString fline
else return ()
This code doesn't fail on binary but is not really detecting binary content. If you want to detect binary, look for B.replacement_char
in your data. To detect non-printable characters, you may look for code points smaller than 32 (space character) as well.