Search code examples
haskellhxt

Parsing an element which may or may not exists with HXT library


I have a problem with HXT. i wanna parse an owl file and i have a problem with my arrow, because he dont wanna parse a tree ! I saw that the problem is that one: Firstly , the code :

import System.Environment  --para uso do getArgs

import Data.List.Split (splitOn)


data Class = Class {
                    name ::String,
                    subClassOf ::String
               } deriving (Show,Eq)


main = do
   [src]<- getArgs
   parser <- runX(readDocument [ withValidate no] src  >>> getClass)
   print parser


parseClass = ifA (hasAttr "rdf:about")  (getAttrValue "rdf:about")  (getAttrValue "rdf:ID")

parseSubClass = getAttrValue "rdf:resource"



split l = if(length (splitOn "#" l) >1) then (splitOn "#" l !! 1) else l


atTag tag = deep (isElem >>> hasName tag)

getClass = atTag "owl:Class" >>>
    proc l -> do
    className <- parseClass -< l
    s <- atTag "rdfs:subClassOf" -< l
    subClass <- parseSubClass -< s
    returnA -< Class { name = (split className), subClassOf = (split subClass) }

with that i should be able to parse on the owl file each nodes where it exists this example :

<owl:Class rdf:about="Damien">
    <rdfs:subClassOf rdf:resource="PurchaseableItem"/>
</owl:Class>

But, when i want to parse a tree like that , it simply doesnt compute and throw it away !

<owl:Class rdf:about="&camera;BodyWithNonAdjustableShutterSpeed">
    <owl:equivalentClass>
        <owl:Class>
            <owl:intersectionOf rdf:parseType="Collection">
                <rdf:Description rdf:about="&camera;Body"/>
                <owl:Restriction>
                    <owl:onProperty rdf:resource="&camera;shutter-speed"/>
                    <owl:cardinality rdf:datatype="&xsd;nonNegativeInteger">0</owl:cardinality>
                </owl:Restriction>
            </owl:intersectionOf>
        </owl:Class>
    </owl:equivalentClass>
</owl:Class>

Why ? Because the Subclass node doesnt exist ! But i want the Class available there and put it on my data even if the subclass doesnt exists ! So, how is that possible to do it ?


My newest version:

import System.Environment  --para uso do getArgs
import Data.List.Split (splitOn)

data Class = Class {
                    name ::String,
                    subClassOf :: String
               } deriving (Show,Eq)

main = do
   [src]<- getArgs
   parser <- runX(readDocument [ withValidate no] src  >>> getClass)
   print parser

parseClass = ifA (hasAttr "rdf:about")  (getAttrValue "rdf:about")  (getAttrValue "rdf:ID")
parseSubClass = (getAttrValue "rdf:resource") `orElse` arr (const "" )

--Test  (é preciso rever esta definição) uma falha se o nome tiver o "#"
split l = if(length (splitOn "#" l) >1) then (splitOn "#" l !! 1) else l

atTag tag = deep (isElem >>> hasName tag)
getClass = atTag "owl:Class" >>>
    proc l -> do
    className <- parseClass -< l
    s <- atTag "rdfs:subClassOf" -< l
    subClass <- parseSubClass -< s
    returnA -< Class { name = (split className), subClassOf = split subClass }

Solution

  • You need to decide what you want when the SubClass node doesn't exist. As I see it, you have two choices:

    • A missing SubClass node means that subClass is the empty string. In that case, simply change your parser to fall back to the empty string when the arrow built around atTag "rdfs:subClassOf" fails:

      getClass = atTag "owl:Class" >>>
          proc l -> do
          className <- parseClass -< l
          subClass <- getSubClass -< l
          returnA -< Class { name = split className, subClassOf = split subClass }
          where
            getSubClass =
              (atTag "rdfs:subClassOf" >>> parseSubClass) `orElse` arr (const "")
      
    • A missing SubClass node means that subClass is Nothing. This requires changing your data definition so that subClassOf is of type Maybe String, but after that it's fairly similar to the previous answer:

      getClass = atTag "owl:Class" >>>
          proc l -> do
          className <- parseClass -< l
          subClass <- getSubClass -< l
          returnA -< Class { name = split className, subClassOf = fmap split subClass }
          where
            getSubClass =
              (atTag "rdfs:subClassOf" >>> parseSubClass >>> arr Just)
              `orElse` arr (const Nothing)
      

    Just so we're clear, because you say that this isn't working in the comments, here's exactly the full program I'm running, which works fine for me:

    {-# LANGUAGE Arrows #-}
    import System.Environment  --para uso do getArgs
    import Data.List.Split (splitOn)
    import Text.XML.HXT.Core
    
    data Class = Class {
                        name ::String,
                        subClassOf ::String
                   } deriving (Show,Eq)
    
    main = do
       [src]<- getArgs
       parser <- runX(readDocument [ withValidate no] src  >>> getClass)
       print parser
    
    parseClass = ifA (hasAttr "rdf:about")
                 (getAttrValue "rdf:about")
                 (getAttrValue "rdf:ID")
    
    parseSubClass = getAttrValue "rdf:resource"
    
    split l = if(length (splitOn "#" l) >1) then (splitOn "#" l !! 1) else l
    
    atTag tag = deep (isElem >>> hasName tag)
    
    getClass = atTag "owl:Class" >>>
        proc l -> do
        className <- parseClass -< l
        subClass <- getSubClass -< l
        returnA -< Class { name = split className, subClassOf = split subClass }
        where
          getSubClass =
            (atTag "rdfs:subClassOf" >>> parseSubClass)
            `orElse` arr (const "")
    

    Note that if you really don't want to combine multple arrow steps with >>> or <<<, another possibility is to use an inner proc:

    getClass = atTag "owl:Class" >>>
        proc l -> do
        className <- parseClass -< l
        subClass <- (proc l' -> do
          s <- atTag "rdfs:subClassOf" -< l'
          parseSubClass -< s)
          `orElse` constA "" -< l
        returnA -< Class { name = split className, subClassOf = split subClass}