Search code examples
xmlhaskellhxtarrow-abstraction

Getting Data from a Simple XML


I am trying to extract some data from an XML input with 6 lines, using HXT. I want to keep HXT, too, because of the Curl integration and because I have other XML files with thousands of lines, later.

My XML looks like this:

<?xml version = "1.0" encoding = "UTF-8"?>
<find>
    <set_number>228461</set_number>
    <no_records>000000008</no_records>
    <no_entries>000000008</no_entries>
</find>

And I've been trying to get together how to parse that. Unfortunately, the Wiki page of HXT has not been a big help (or I just did overlook stuff).

data FindResult = FindResult {
        resultSetNumber :: String,
        resultNoRecords :: Int,
        resultNoEntries :: Int
    } deriving (Eq, Show)

resultParser :: ArrowXml a => a XmlTree FindResult
resultParser = hasName "find" >>> getChildren >>> proc x -> do
    setNumber <- isElem >>> hasName "set_number" >>> getChildren >>> getText -< x
    noRecords <- isElem >>> hasName "no_records" >>> getChildren >>> getText -< x
    noEntries <- isElem >>> hasName "no_entries" >>> getChildren >>> getText -< x
    returnA -< FindResult setNumber (read noRecords) (read noEntries)

find str = return . head =<< (runX $ readDocument [withValidate no, withCurl []] query >>> resultParser)
    where query = "http://" ++ server ++ "/find?request=" ++ str

What I always get is

*** Exception: Prelude.head: empty list

so, I guess, the parsing must go horribly wrong, since I checked and correctly get the XML from the query.


Solution

  • The following works for me (modelled after this example):

    {-# LANGUAGE Arrows #-}
    
    module Main
           where
    
    import Text.XML.HXT.Core
    import System.Environment
    
    data FindResult = FindResult {
            resultSetNumber :: String,
            resultNoRecords :: Int,
            resultNoEntries :: Int
        } deriving (Eq, Show)
    
    resultParser :: ArrowXml a => a XmlTree FindResult
    resultParser =
      deep (isElem >>> hasName "find") >>> proc x -> do
        setNumber <- getText <<< getChildren <<< deep (hasName "set_number") -< x
        noRecords <- getText <<< getChildren <<< deep (hasName "no_records") -< x
        noEntries <- getText <<< getChildren <<< deep (hasName "no_entries") -< x
        returnA -< FindResult setNumber (read noRecords) (read noEntries)
    
    main :: IO ()
    main = do [src] <- getArgs
              res <- runX $ ( readDocument [withValidate no] src >>> resultParser)
              print . head $ res
    

    Testing:

    $ dist/build/test/test INPUT
    FindResult {resultSetNumber = "228461", resultNoRecords = 8, resultNoEntries = 8}