Search code examples
haskellsyntaxgadt

What does data ... where mean in Haskell?


I saw this snippet at the devlog of omegagb:

data ExecutionAST result where
  Return :: result -> ExecutionAST result
  Bind :: (ExecutionAST oldres) -> (oldres -> ExecutionAST result) ->
          ExecutionAST result
  WriteRegister :: M_Register -> Word8 -> ExecutionAST ()
  ReadRegister :: M_Register -> ExecutionAST Word8
  WriteRegister2 :: M_Register2 -> Word16 -> ExecutionAST ()
  ReadRegister2 :: M_Register2 -> ExecutionAST Word16
  WriteMemory :: Word16 -> Word8 -> ExecutionAST ()
  ReadMemory :: Word16 -> ExecutionAST Word8

What does the data ... where mean? I thought the keyword data is used to define a new type.


Solution

  • It defines a new type, the syntax is called generalized algebraic data type.

    It is more general than the normal syntax. You can write any normal type definition (ADT) using GADTs:

    data E a = A a | B Integer
    

    can be written as:

    data E a where
      A :: a -> E a
      B :: Integer -> E a
    

    But you can also restrict what is on right hand side:

    data E a where
      A :: a -> E a
      B :: Integer -> E a
      C :: Bool -> E Bool
    

    which is not possible with a normal ADT declaration.

    For more, check Haskell wiki or this video.


    The reason is type safety. ExecutionAST t is supposed to be type of statements returning t. If you write a normal ADT

    data ExecutionAST result = Return result 
                             | WriteRegister M_Register Word8
                             | ReadRegister M_Register
                             | ReadMemory Word16
                             | WriteMemory Word16
                             | ...
    

    then ReadMemory 5 will be a polymorphic value of type ExecutionAST t, instead of monomorphic ExecutionAST Word8, and this will type check:

    x :: M_Register2
    x = ...
    
    a = Bind (ReadMemory 1) (WriteRegister2 x)
    

    That statement should read memory from location 1 and write to register x. However, reading from memory gives 8-bit words, and writing to x requires 16-bit words. By using a GADT, you can be sure this won't compile. Compile-time errors are better than run-time errors.

    GADTs also include existential types. If you tried to write bind this way:

    data ExecutionAST result = ... 
                               | Bind (ExecutionAST oldres)
                                      (oldres -> ExecutionAST result)
    

    then it won't compile since "oldres" is not in scope, you have to write:

    data ExecutionAST result = ...
                               | forall oldres. Bind (ExecutionAST oldres)
                                                     (oldres -> ExecutionAST result)
    

    If you are confused, check the linked video for simpler, related example.