Search code examples
nodesgrammarsml

SML: Count nodes with Elements in a Grammar


I've created the following simple grammar in SML:

datatype identifier = IdentE of char;

datatype term = ID of identifier | MultE of term * term * identifier;

datatype expression = Term of term | AddE of expression * expression * term;

I want to count the number of IdentE datatypes that exist.

My guess on how to do that was the following, which of course doesn't work, but I'm not sure how to actually accomplish this task:

fun sum(Empty)= 0 | sum(Term) = sum(ID) + | sum(IdentE) = 1|| sum(AddE(AddE, Term) =  sum(AddE) + sum(Term) |  sum(MultE(MultE, IdentE) =  sum(MultE) + sum(IdentE);

I am very new to SML, so I'm probably making a rookie mistake.


Solution

  • It seems that your datatypes model some subset of mathematical expressions and that you've split addition, multiplication and variables into two separate datatypes expression and term. Because terms only refer to terms and not back to expressions, the expression a + (b × c) can be encoded as:

    AddE (Term (ID (IdentE #"a")),
          Term (MultE (ID (IdentE #"b")),
                       ID (IdentE #"c")),
          IdentE #"?")
    

    but you cannot encode the expression a × (b + c).

    I'm not sure why addition has three arguments when, presumably, it's a binary operator.

    Here is your sum function with proper spacing and linebreaks:

    fun sum(Empty) = 0
      | sum(Term) = sum(ID) +
      | sum(IdentE) = 1
     || sum(AddE(AddE, Term) = sum(AddE) + sum(Term)
      | sum(MultE(MultE, IdentE) = sum(MultE) + sum(IdentE)
    

    A few things become apparent:

    • You didn't read your own code very carefully before you posted it.
    • There is no Empty constructor in your datatype. What's an "empty expression"?
    • It seems that you forgot to complete the expression after the + in the second line.
    • You have one | too many in the fourth line and an ) too little in the fourth and fifth line.
    • Your function sum can't simultaneously handle values of multiple types. If it accepts a value that matches AddE (...) in one line, it can't also accept a value that matches MultE (...) in another line, because they're of different types.
    • Every value constructor you're pattern matching against is void of its parameters. Every recursive call is made against the name of the value constructor rather than the parameters of the value that is passed in. If you want a function that accepts expressions, one could look like:

      fun sum (AddE (e1, e2, t)) = sum e1 + sum e2 + ...
        | sum (Term t) = ...
      
    • Since you have multiple, recursive datatypes, your best bet is to have multiple functions that handle recursion for each datatype. (This isn't strictly necessary, but alternatively you have some pretty complex patterns.)

      fun sum (AddE (e1, e2, t)) = sum e1 + sum e2 + sumTerm t
        | sum (Term t) = sumTerm t
      
      and sumTerm (ID id) = 1
        | sumTerm (MultE (t1, t2, id)) = sumTerm t1 + sumTerm t2 + 1
      

      Again, I'm not sure why additions have a third term, and why multiplications have a third identifier, and why you have multiple datatypes to begin with.

    If I were to model a simple arithmetic syntax tree and count the number of variables it has, I'd start off with one datatype for which binary operators have only two parameters each:

    datatype expr = AddE of expr * expr
                  | MultE of expr * expr
                  | ConstE of int
                  | VarE of string
    

    With this datatype definition,

    • a + (b × c) becomes AddE (VarE "a", MultE (VarE "b", VarE "c")), and
    • a × (b + c) becomes MultE (VarE "a", AddE (VarE "b", VarE "c")).

    And I could make a function that counts the number of variables in an expression like this:

    fun countVars (AddE (e1, e2)) = countVars e1 + countVars e2
      | countVars (MultE (e1, e2)) = countVars e1 + countVars e2
      | countVars (ConstE const) = 0
      | countVars (VarE varname) = 1
    

    I might instead be interested in getting a list of those variables:

    fun getVars (AddE (e1, e2)) = getVars e1 @ getVars e2
      | getVars (MultE (e1, e2)) = getVars e1 @ getVars e2
      | getVars (ConstE const) = []
      | getVars (VarE varname) = [varname]