Search code examples
ocamlgadtphantom-types

Polymorphic function for phantom GADT type


I am new to OCaml. And I am writing a simple assembler compiler. I have the following types which I use in my AST.

type asmreg8
type asmreg16
type asmreg32
type asmreg64
type asmreg128

type _ reg =
  | Reg8 : string -> asmreg8 reg
  | Reg16 : string -> asmreg16 reg
  | Reg32 : string -> asmreg32 reg
  | Reg64 : string -> asmreg64 reg
  | Reg128 : string -> asmreg128 reg

type asmlabel
type _ const = ASMConst of string const
type _ var = ASMVar of string var
type _ label = ASMLabel of asmlabel label

type double_arg =
  | RegToReg : 'a reg * 'a reg -> double_arg
  | RegToConst : 'a reg * 'b const -> double_arg

type mnemonic = Mov | Add | Inc | Ret

type single_arg =
  | Var : _ var -> single_arg
  | Reg : _ reg -> single_arg
  | Label : _ label -> single_arg

type command =
  | Args2 of mnemonic * double_arg
  | Args1 of mnemonic * single_arg
  | Args0 of mnemonic

type code_section = Command of command | Id of asmlabel label
type data_type = DB | DW | DD | DQ | DT
type var = string * data_type * string list
type dir = Code of code_section list | Data of var list

I think types are good because they forbid operations on different (by bits) registers. But now I have a problem. I do not know how to write a function like the following

let string_to_reg = function
  | "something" -> Reg8 "something"
  | "Reg16" -> Reg16 "16"
  | _ -> Reg32 "123"

I just do not know how to get a function that takes a string (or another type that does not explicitly contain 'a) and returns values of type 'a reg where 'a can be asmreg8, asmreg16 ets. I have read several articles about GADT and phantom types, but have not found an example that suits me. I'm starting to think I'm missing something about types and that things should be a lot simpler.

I tried this code `

let string_to_reg : type a. string -> a reg = function
  | "something" -> Reg8 "something"
  | "Reg16" -> Reg16 "16"
  | _ -> Reg32 "123"

but I get

This expression has type asmreg8 reg but an expression was expected of type
  a reg
Type asmreg8 is not compatible with type a

I thought I can just show to compiler that the function returns a polymorphic type. I tried this

let string_to_reg : string -> 'a reg = function
  | "something" -> Reg8 "something"
  | "Reg16" -> Reg16 "16"
  | _ -> Reg32 "123"

but I get a string -> asmreg8 reg function and it can not return value of asmreg16 reg type.

I tried pack my _ ret types by another type

type asmreg =
  | ASM8 : asmreg8 reg -> asmreg
  | ASM16 : asmreg16 reg -> asmreg
  | ASM32 : asmreg32 reg -> asmreg
  | ASM64 : asmreg64 reg -> asmreg
  | ASM128 : asmreg128 reg -> asmreg

let reg_to_asmreg : type a. asmreg -> a reg = function
  | ASM8 x -> Reg8 x
  | ASM16 x -> Reg16 x
  | _ -> Reg32 "123"

but I have a same problem. I can not write a function that returns values of different reg types.


Solution

  • It is not possible to write the function string_to_reg directly

    let string_to_reg s x = match x with
      | "something" -> Reg8 "something"
      | "Reg16" -> Reg16 "16"
      | _ -> Reg32 "123"
    

    because the type of the function would not be 'a. string -> 'a reg. Indeed this type means that the function returns a register of all possible type simultaneously which is not what you meant. The correct type of the result of the function would be something like (x:string) -> reg_type(x) in a dependently typed language. In other words, you want the type of the result to depend on the value of its argument. This is not possible in OCaml (in the core language).

    This is common issue when interfacing a strongly typed GADT DSLs with the exterior world. The solution is to use another layer of GADTs to convey the information that the type of the parsed register cannot be known in advance:

    type dyn_reg = Dyn: 'any reg -> dyn_reg [@@unboxed]
    let string_to_dyn_reg s = function
      | "something" -> Dyn (Reg8 "something")
      | "Reg16" -> Dyn(Reg16 "16")
      | _ -> Dyn(Reg32 "123")
    

    Here, if I have a value Dyn x, I know that there is some type 'e such that the type of x is x:'e reg, but I don't know anything more about this type 'e (this kind of type variable 'e is said to be existentially quantified type). Thus I can only apply function on x that work for any 'e reg.

    Since reg is a GADTs, the possible values for 'e reg are well-known, so there are still many useful function of type 'e. 'e reg -> ... and it should still possible to use the parsed register.

    An important point to notice however is that this dyn_reg type has mostly erased all fine-grained type information that was available on the GADTs definition, and whenever we are using dyn_reg, we are in fact using an encoding of the normal variants. Thus, if you end up using only dyn_reg, it is a sign that GADTs were not really useful in this case.

    EDIT:

    A good example of the consequence of erasing the types is to try to write a move function. Just writing

    let mov (Dyn a) (Dyn b) =
      Args2(Mov, RegToReg(a,b)
    

    fails because the type of a and b might not be compatible. However, once we check that the types are compatible, the function compiles as expected

    exception Type_error
    let mov (Dyn a) (Dyn b) = 
      let matching x y = Args2(Mov,RegToReg(x,y)) in 
      match a, b with
      | Reg8 _,   Reg8 _   -> matching a b
      | Reg16 _,  Reg16 _  -> matching a b
      | Reg32 _,  Reg32 _  -> matching a b
      | Reg64 _,  Reg64 _  -> matching a b
      | Reg128 _, Reg128 _ -> matching a b
      | _ -> raise Type_error