Search code examples
jsonparsinghaskellbytestring

Haskell building simple JSON parser


Getting my feet wet with building stuff, and not being able to get Aeson to work properly I decided my new project is building a JSON parser. Very abstract since it is one way or another, so it wouldn't make sense to put all the code here. The ByteString library lets me do what I need. Remove characters, replace stuff, but: I have a very hard time reconstructing it the exact way I took it apart. Data.Text however seems more appropriate for the job but when generated a lot of noise with /"/, \n etc. What would be the best and fastest way to clear a file from all rubbish and restore the remaining parts to useful text? Very small part below. Remarks on the code are welcome. Learning here.

import Network.HTTP.Simple
import GHC.Generics
import qualified Data.ByteString as B
import qualified Data.ByteString.Char8 as C
import Data.Text as T
import Data.Char
import Data.Text.Encoding as DTE

word8QuoteMark = fromIntegral (ord '"')
word8Newline = fromIntegral (ord '\n')
word8Backslash = fromIntegral (ord ':')

filterJson jsonData = B.filter (/= word8Backslash)
                        (B.filter (/= word8Newline)
                           (B.filter (/= word8QuoteMark) jsonData))

importJson :: IO ()
importJson = do
        jsonData <- B.readFile "local.json"
        output <- return (filterJson jsonData)
        print $ (output)

Now the downside is, that if someone is called eg. François, it is now returned as Fran\195\167ois. I think I would need a lot more steps to do this in Data.Text, but correct me if I am wrong...

Note: i saw in a post that Daniel Wagner strongly advises against ByteString for text, but just for the sake of argument.


Solution

  • JSON is, by definition, a Unicode string that represents a data structure. What you get from B.readFile, though, is a raw byte string that you must first decode to get a Unicode string. To do that, you need to know what encoding was used to create the file. Assuming the file uses UTF-8 encoding, you can do something like

    import Data.Text
    
    importJson :: String -> IO Text
    importJson name = do
        jsonData <- B.readFile name
        return (Data.Text.Encoding.decodeUtf8 jsonData)
    

    Once you have a Text value, you can parse that into some data structure according to the JSON grammar.