Search code examples
haskelltype-familiesnewtype

Restrictions of unboxed types


I wonder why unboxed types in Haskell have these restrictions:

  1. You cannot define a newtype for unboxed type:

    newtype Vec = Vec (# Float#, Float# #)
    

    but you can define type synonim:

    type Vec = (# Float#, Float# #)
    
  2. Type families can't return unboxed type:

    type family Unbox (a :: *) :: # where
        Unbox Int    = Int#
        Unbox Word   = Word#
        Unbox Float  = Float#
        Unbox Double = Double#
        Unbox Char   = Char#
    

Are there some fundamental reasons behind this, or it's just because no one asked for this features?


Solution

  • Parametric polymorphism in Haskell relies on the fact that all values of t :: * types are uniformly represented as a pointer to a runtime object. Thus, the same machine code works for all instantiations of polymorphic values.

    Contrast polymorphic functions in Rust or C++. For example, the identity function there still has type analoguous to forall a. a -> a, but since values of different a types may have different sizes, the compilers have to generate different code for each instatiation. This also means that we can't pass polymorphic functions around in runtime boxes:

    data Id = Id (forall a. a -> a)
    

    since such a function would have to work correctly for arbitrary-sized objects. It requires some additional infrastructure to allow this feature, for example we could require that a runtime forall a. a -> a function takes extra implicit arguments that carry information about the size and constructors/destructors of a values.

    Now, the problem with newtype Vec = Vec (# Float#, Float# #) is that even though Vec has kind *, runtime code that expects values of some t :: * can't handle it. It's a stack-allocated pair of floats, not a pointer to a Haskell object, and passing it to code expecting Haskell objects would result in segfaults or errors.

    In general (# a, b #) isn't necessarily pointer-sized, so we can't copy it into pointer-sized data fields.

    Type families returning # types are disallowed for related reasons. Consider the following:

    type family Foo (a :: *) :: # where
      Foo Int = Int#
      Foo a   = (# Int#, Int# #)
    
    data Box = forall (a :: *). Box (Foo a)
    

    Our Box is not representable runtime, since Foo a has different sizes for different a-s. Generally, polymorphism over # would require generating different code for different instantiations, like in Rust, but this interacts badly with regular parametric polymorphism and makes runtime representation of polymorphic values difficult, so GHC doesn't bother with any of this.

    (Not saying though that a usable implementation couldn't possibly be devised)