Search code examples
parsinginverselogic-programmingcurryfunctional-logic-progr

Generating a parser with `inverse`, with constraints on the grammar


I recently followed through A Taste of Curry, and afterwards decided to put the trivial arithmetic parser example to test, by writing a somewhat more substantial parser: a primitive but correct and functional HTML parser.

I ended up with a working node2string function to operate on Node (with attributes and children), which I then inversed to obtain a parse function, as exemplified in the article.

The first naive implementation had the mistake that it parsed anything but e.g. the trivial <input/> HTML snippet into exactly one Node representation; everything else nondeterministically yielded invalid things like

Node { name = "input", attrs = [Attr "type" "submit"] }
Node { name = "input type=\"submit\"", attrs = [] }

and so on.

After some initial naive attempts to fix that from within node2string, I realized the point, which I believe all seasoned logic programmers see instantaneously, that parse = inverse node2string was right more right and insightful about the sitatution than I was: the above 2 parse results of <input type="submit"/> indeed were exactly the 2 valid and constructible values of Node that would lead to HTML representations.

I realized I had to constrain Node to only allow passing in alphabetic — well not really but let's keep it simple — names (and of course same for Attr). In a less fundamental setting than a logic program (such as regular Haskell with much more hand written and "instructional" as opposed to purely declarative programming), I would simply have hidden the Node constructor behind e.g. a mkNode sentinel function, but I have the feeling this wouldn't work well in Curry due to how the inference engine or constraint solver work (I might be wrong on this, and in fact I hope I am).

So I ended up instead with the following. I think Curry metaprogramming (or Template Haskell, if Curry supported it) could be used ot clean up the manual boielrplate, but cosmetically dealing is only one way out of the situation.

data Name = Name [NameChar] -- newtype crashes the compiler
data NameChar = A | B | C | D | E | F | G | H | I | J | K | L | M | N | O | P | Q | R | S | T | U | V | W | X | Y | Z

name2char :: NameChar -> Char
name2char c = case c of A -> 'a'; B -> 'b'; C -> 'c'; D -> 'd'; E -> 'e'; F -> 'f'; G -> 'g'; H -> 'h'; I -> 'i'; J -> 'j'; K -> 'k'; L -> 'l'; M -> 'm'; N -> 'n'; O -> 'o'; P -> 'p'; Q -> 'q'; R -> 'r'; S -> 's'; T -> 't'; U -> 'u'; V -> 'v'; W -> 'w'; X -> 'x'; Y -> 'y'; Z -> 'z'

name2string :: Name -> String
name2string (Name s) = map name2char s

-- for "string literal" support
nameFromString :: String -> Name
nameFromString = inverse name2string

data Node = Node { nodeName :: Name, attrs :: [Attr], children :: [Node] }
data Attr = Attr { attrName :: Name, value :: String }

attr2string :: Attr -> String
attr2string (Attr name value) = name2string name ++ "=\"" ++ escape value ++ "\""
  where escape = concatMap (\c -> if c == '"' then "\\\"" else [c])

node2string :: Node -> String
node2string (Node name attrs children) | null children = "<" ++ name' ++ attrs' ++ "/>"
                                       | otherwise     = "<" ++ name' ++ attrs' ++ ">" ++ children' ++ "</" ++ name' ++ ">"
  where name'     = name2string name
        attrs'    = (concatMap ((" " ++) . attr2string) attrs)
        children' = intercalate "" $ map (node2string) children

inverse :: (a -> b) -> (b -> a)
inverse f y | f x =:= y = x where x free

parse :: String -> Node
parse = inverse node2string

This, in fact, works perfectly (in my judgement):

Parser> parse "<input type=\"submit\"/>"
(Node [I,N,P,U,T] [(Attr [T,Y,P,E] "submit")] [])

Parser> parse "<input type=\"submit\" name=\"btn1\"/>"
(Node [I,N,P,U,T] [(Attr [T,Y,P,E] "submit"),(Attr [N,A,M,E] "btn1")] [])

(Curry doesn't have type classes so I wouldn't know yet how to make [NameChar] print more nicely)

However, my question is:

is there a way to use something like isAlpha (or a function more true to the actual HTML spec, of course) to achieve a result equivalent to this, without having to go through the verbose boilerplate that NameChar and its "supporting members" are? There seems to be no way to even place the "functional restriction" anywhere within the ADT.

In a Dependently Typed Functional Logic Programming language, I would just express the constraint at the type level and let the inference engine or constraint solver deal with it, but here I seem to be at a loss.


Solution

  • You can achieve the same results using just Char. As you've already pointed out, you can use isAlpha to define name2char as a partial identity. I changed the following lines of your code.

    type NameChar = Char
    
    name2char :: NameChar -> Char
    name2char c | isAlpha c = c
    

    The two exemplary expressions then evaluate as follows.

    test> parse "<input type=\"submit\" name=\"btn1\"/>"
    (Node (Name "input") [(Attr (Name "type") "submit"),(Attr (Name "name") "btn1")] [])
    
    test> parse "<input type=\"submit\"/>"
    (Node (Name "input") [(Attr (Name "type") "submit")] [])
    

    As a side-effect, names with non-alpha characters silently fail with nameFromString.

    test> nameFromString "input "
    

    Edit: Since you seem to be a fan of function patterns, you can define generators for Nodes and Attrs and use them in your conversion function.

    attr :: Name -> String -> Attr
    attr name val
      | name `elem` ["type", "src", "alt", "name"] = Attr name val
    
    node :: String -> [Attr] -> [Node] -> Node
    node name [] nodes
      |  name `elem` ["a", "p"] = Node name [] nodes
    node name attrPairs@(_:_) nodes
      |  name `elem` ["img", "input"] = Node name attrPairs nodes
    
    node2string :: Node -> String
    node2string (node name attrs children)
      | null children = "<" ++ name ++ attrs' ++ "/>"
      | otherwise     = "<" ++ name ++ attrs' ++ ">"
                      ++ children' ++ "</" ++ name' ++ ">"
     where
      name'     = name
      attrs'    = concatMap ((" " ++) . attr2string) attrs
      children' = intercalate "" $ map (node2string) children
    
    attr2string :: Attr -> String
    attr2string (attr name val) = name ++ "=\"" ++ escape val ++ "\""
     where
      escape = concatMap (\c -> if c == '"' then "\\\"" else [c])
    

    This approach has its disadvantages; it works quite well for a specific set of valid names, but fails miserably when you use a predicate like before (e.g., all isAlpha name).

    Edit2: Besides the fact that the solution with the isAlpha condition is quite "prettier" than your verbose solution, it is also defined in a declarative way. Without your comments, it doesn't become clear (that easily) that you are encoding alphabetic characters with your NameChar data type. The isAlpha condition on the other hand is a good example for a declarative specification of the wanted property. Does this answer your question? I'm not sure what you are aiming at.