Search code examples
stringpattern-matchingsml

SML Pattern Matching on char lists


I'm trying to pattern match on char lists in SML. I pass in a char list generated from a string as an argument to the helper function, but I get an error saying "non-constructor applied to argument in pattern". The error goes away if instead of

#"a"::#"b"::#"c"::#"d"::_::nil 

I use:

#"a"::_::nil.

Any explanations regarding why this happens would be much appreciated, and work-arounds if any. I'm guessing I could use the substring function to check this specific substring in the original string, but I find pattern matching intriguing and wanted to take a shot. Also, I need specific information in the char list located somewhere later in the string, and I was wondering if my pattern could be:

#"some useless characters"::#"list of characters I want"::#"newline character"

I checked out How to do pattern matching on string in SML? but it didn't help.

fun somefunction(#"a"::#"b"::#"c"::#"d"::_::nil) = print("true\n")
  | somefunction(_) = print("false\n")

Solution

  • If you add parentheses around the characters the problem goes away:

    fun somefunction((#"a")::(#"b")::(#"c")::(#"d")::_::nil) = print("true\n")
      | somefunction(_) = print("false\n")
    

    Then somefunction (explode "abcde") prints true and somefunction (explode "abcdef") prints false.

    I'm not quite sure why the SML parser had difficulties parsing the original definition. The error message suggests that is was interpreting # as a function which is applied to strings. The problem doesn't arise simply in pattern matching. SML also has difficulty with an expression like #"a"::#"b"::[]. At first it seems like a precedence problem (of # and ::) but that isn't the issue since #"a"::explode "bc" works as expected (matching your observation of how your definition worked when only one # appeared). I suspect that the problem traces to the fact that characters where added to the language with SML 97. The earlier SML 90 viewed characters as strings of length 1. Perhaps there is some sort of behind-the-scenes kludge with the way the symbol # as a part of character literals was grafted onto the language.