I'm learning HXT
at the moment by using it to parse a GPX file. An example is here. I've got the following so far:
import Data.Time
import Text.XML.HXT.Core
data Gpx = Gpx [Trk] deriving (Show)
data Trk = Trk [TrkSeg] deriving (Show)
data TrkSeg = TrkSeg [TrkPt] deriving (Show)
data TrkPt = TrkPt Double Double deriving (Show)
parseGpx =
getChildren >>> isElem >>> hasName "gpx" >>>
getChildren >>> isElem >>> hasName "trk" >>>
parseGpxTrk >>> arr Gpx
parseGpxTrk = undefined
parseGpxTrkSegs = undefined
You can see that it's incomplete, but it should still type-check. Unfortunately, I'm already running into an error:
Couldn't match type ‘Trk’ with ‘[Trk]’
Expected type: Trk -> Gpx
Actual type: [Trk] -> Gpx
In the first argument of ‘arr’, namely ‘Gpx’
In the second argument of ‘(>>>)’, namely ‘arr Gpx’
What this error says is that I'm trying to pass each matched item from the parseGpxTrk
arrow through the arr Gpx
constructor, but what I actually want is to pass the entire list of matches through the arr Gpx
constructor.
So, how do I get HXT
(or arrows in general?) to pass the matches as a list through my arr Gpx
constructor instead of passing each entry in the list through the arr Gpx
constructor?
Here's a solution that seems pretty good to me
{-# LANGUAGE Arrows #-}
import Data.Maybe
import Text.Read
import Text.XML.HXT.Core
import Control.Applicative
data Gpx = Gpx [Trk] deriving (Show)
data Trk = Trk [TrkSeg] deriving (Show)
data TrkSeg = TrkSeg [TrkPt] deriving (Show)
data TrkPt = TrkPt Double Double deriving (Show)
The trickiest one is probably the parseTrkPt
because in order to do it right you have to handle parsing String
s to Double
, which can fail. I've made the decision to have it return a Maybe TrkPt
instead, and then handle that further down the line:
elemsNamed :: ArrowXml cat => String -> cat XmlTree XmlTree
elemsNamed name = isElem >>> hasName name
parseTrkPt :: ArrowXml cat => cat XmlTree (Maybe TrkPt)
parseTrkPt = elemsNamed "trkpt" >>>
proc trkpt -> do
lat <- getAttrValue "lat" -< trkpt
lon <- getAttrValue "lon" -< trkpt
returnA -< TrkPt <$> readMaybe lat <*> readMaybe lon
I've also used the proc
syntax here because I think it comes out a lot cleaner. The TrkPt <$> readMaybe lat <*> readMaybe lon
has the type Maybe TrkPt
and will return Nothing
if either of the readMaybe
s returns Nothing
. We can now aggregate all the successful results:
parseTrkSeg :: (ArrowXml cat, ArrowList cat) => cat XmlTree TrkSeg
parseTrkSeg =
elemsNamed "trkseg" >>>
(getChildren >>> parseTrkPt >>. catMaybes) >. TrkSeg
The parentheses are important here, it took me a while to figure that part out. Depending on where you place the parens you'll get different results, such as [TrkSeg [TrkPt a b], TrkSeg [TrkPt c d]]
instead of [TrkSeg [TrkPt a b, TrkPt c d]]
. The next to parsers are both straightforward following a similar pattern:
parseTrk :: ArrowXml cat => cat XmlTree Trk
parseTrk =
elemsNamed "trk" >>>
(getChildren >>> parseTrkSeg) >. Trk
parseGpx :: ArrowXml cat => cat XmlTree Gpx
parseGpx =
elemsNamed "gpx" >>>
(getChildren >>> parseTrk) >. Gpx
Then you can run it quite simply, although you'll have to still drill passed the root element:
main :: IO ()
main = do
gpxs <- runX $ readDocument [withRemoveWS yes] "ana.gpx"
>>> getChildren
>>> parseGpx
-- Pretty print the document
forM_ gpxs $ \(Gpx trks) -> do
putStrLn "GPX:"
forM_ trks $ \(Trk segs) -> do
putStrLn "\tTRK:"
forM_ segs $ \(TrkSeg pts) -> do
putStrLn "\t\tSEG:"
forM_ pts $ \pt -> do
putStr "\t\t\t"
print pt
The trick is to use the methods in the ArrowList
typeclass, notably >.
which has the type a b c -> ([c] -> d) -> a b d
. It aggregates the elements from the ArrowList
, passes it to a function that converts it to a new type, then outputs a new ArrowList
on that new type d
.
If you want you can even abstract this a bit for the last 3 parsers:
nestedListParser :: ArrowXml cat => String -> cat XmlTree a -> ([a] -> b) -> cat XmlTree b
nestedListParser name subparser constructor
= elemsNamed name
>>> (getChildren >>> subparser)
>. constructor
parseTrkSeg :: (ArrowXml cat, ArrowList cat) => cat XmlTree TrkSeg
parseTrkSeg = nestedListParser "trkseg" (parseTrkPt >>. catMaybes) TrkSeg
parseTrk :: ArrowXml cat => cat XmlTree Trk
parseTrk = nestedListParser "trk" parseTrkSeg Trk
parseGpx :: ArrowXml cat => cat XmlTree Gpx
parseGpx = nestedListParser "gpx" parseTrk Gpx
This might come in handy if you want to complete the rest of the grammar of a GPX file.