Search code examples
haskelltypestypeclass

How (or why) is `Data.Set String` not a single type?


I am studying Haskell the hard way, by trying to write something I find interesting, and right now I'm trying to figure out how to derive a Semiring in Haskell for a specific set of parsing problems:

class Semiring s where
    zero, one :: s
    mul, add  :: s -> s -> s

instance Semiring Bool where
    zero = False
    one = True
    add = (||)
    mul = (&&)

instance Semiring (Set String) where
    zero    = empty 
    one     = singleton ""
    add a b = union a b
    mul a b = Data.Set.map (\(c, d) -> c ++ d) $ cartesianProduct a b

The Bool ({true, false}, ∨, ∧, false, true) version works great. So does an Int version. The last one is called a Parse Forest, and its representation is (E, ∪, ·, ∅, {<>}), where E is a set of strings and the {<>} is the set of the empty string.

When I try to compile this, I get:

Rigge…   115  10 error           • Illegal instance declaration for ‘Semiring (Set String)’
(All instance types must be of the form (T a1 ... an)
where a1 ... an are *distinct type variables*,
and each type variable appears at most once in the instance head.

Which doesn't make that much sense to me. Set String is a distinct type, right, and all of the operations of the class Semiring should be expressible purely in terms of sets of strings.

If you want context, the project is at Rigged Regular Expressions. The Bool version merely reports back that the regex matched; an Int version reports back the number of different ways a regex could have matched (i.e "a" ~ /(a|a*)/ will return 2 because two distinct and unique subexpressions match); the ParseForest should return not the number of ways, but the set of all possible ways— but it can't, because I don't understand why I can't use a concrete data type, Set String, where another concrete data type like Int or Bool worked okay.


Solution

  • The crucial part is

    of the form (T a1 ... an) where a1 ... an are *distinct type variables*,
    

    Your type is Set String, so T = Set and a1 = String (and n=1). But String is a type, not a type variable. A compliant instance would instead have been

    instance (....) => Semiring (Set a) where
       ...
    

    Anyway, this is an ancient restriction of Haskell2010, which you can ignore. In modern GHC Haskell, you can turn on the FlexibleInstances extension, and use your own instance without issues. GHC itself should suggest to turn this on in the error message.

    Note that nowadays almost nobody programs in strict Haskell2010: there are so many extensions which have become too commonly used. Arguably, there should be a revision of the Report, say Haskell2020, where most of the common harmless extensions are included for the great good. Still, until someone actually does that, we will need to turn on extensions frequently.