Search code examples
haskelltypestypeclassunit-type

How Can One Express the Fact that `()` is a Subset of All Other (Non-`Void`) Types in Haskell?


I recently started learning Haskell, and I wrote the following code as part of a small parsing library:

-- Successful iff the input string has a length of zero
parseEOF :: Parser ()
parseEOF = Parser p
  where
    p [] = Just ((), "")
    p _ = Nothing

This code would benefit from being polymorphic as the () aspect of it just indicates that no information is being expressed in the output, which is possible with any non-Void type, e.g. for the integers, this "no relative information" element could be identified with 0.

A simple fix for this is to make the following type class:

class NonVoid n where
  nil :: n

with the above code rerendered as

parseEOF :: NonVoid n => Parser n
parseEOF = Parser p
  where
    p [] = Just (nil, "")
    p _ = Nothing

The issue with the type class approach is it can be cumbersome to implement for all types. Is there an alternative approach for expressing the notion of () being a "subset" of all other types in Haskell? I'm not just asking for my above code (my issue can be solved in other ways), but this seems like a very important idea in general. This idea can further be generalized for the 2-type and any n-type.


Solution

  • There is already a default class in reasonably wide use. That at least has the advantage over your NonVoid that there are already a bunch of instances for it.

    I find it less useful than you'd imagine as which value you'd want to be the default value of a type is frequently context-specific, but it's probably the closest thing there is to a canonical way of picking an arbitrary value of many types.

    Alternatively you could possibly use generics to pick an arbitrary value of any type with a Generic instance. Again, you're still dependent on an instance existing for every type you'd want to use, but lots of them already do and Generic can be derived so it's not terribly burdensome.

    Ultimately, however, I don't think it's actually a good idea to generalise from () to any inhabited type. If I saw parseEOF :: Parser () I would know exactly what it does just from reading the name and type. If I saw parseEOF :: Default a => Parser a I would be fairly surprised and not quite trust its behaviour without reading docs or digging into the source code. I think Parser () is just a better type for expressing the idea of "a parser that succeeds only if there's no input left to consume, producing no information". The fact that in an information theory sense () is a subset of every inhabited type doesn't seem useful here for producing an interface that is easy to use and comprehend.

    In fact the generalised parseEOF :: Default a => Parse a will probably be extremely inconvenient to use, because most of the time the user won't do anything with the result value. That will mean GHC has no information to use to infer the type it has to resolve a Default instance (or whatever class you use), which means you'll get an ambiguity error. It just makes you have to add type annotations (or type applications) in the common case when you want it to produce () because you don't need some other value.

    The same goes for the huge number of existing monadic actions that have () as their result type. Your logic isn't specific to parsers or EOF; the same reasoning could suggest that putStrLn ought to be generalised to something like putStrLn :: Default a => String -> IO a. Most of the time I think it's actually better to have a type that specifically says it produces no information than it is to have a polymorphic type and have to figure out where it's getting the value from and whether it matters. Substituting an "uninformative" value for () is a trivial thing for end-users to do when they want to, but it's not really possible for a library to do so in a way that would be useful in all possible cases.