Search code examples
algorithmtype-theorysystem-f

How to systematically compute the number of inhabitants of a given type?


How to systematically compute the number of inhabitants of a given type in System F?

Assuming the following restrictions:

  • All inhabitants terminate, i.e. no bottom.
  • All inhabitants lack side-effects.

For example (using Haskell syntax):

  • Bool has two inhabitants.
  • (Bool, Bool) has four inhabitants.
  • Bool -> (Bool, Bool) has sixteen inhabitants.
  • forall a. a -> a has one inhabitant.
  • forall a. (a, a) -> (a, a) has four inhabitants.
  • forall a b. a -> b -> a has one inhabitant.
  • forall a. a has zero inhabitants.

Implementing an algorithm for the first three is trivial, but I can't figure out how to do it for the other ones.


Solution

  • I wanted to solve the same problem. The following discussion certainly helped me a lot:

    Abusing the algebra of algebraic data types - why does this work?

    At first, I too was troubled by types such as forall a. a -> a. Then, I had an epiphany. I realized that the type forall a. a -> a was the Mogensen-Scott encoding of the unit type. Hence, it had only one inhabitant. Similarly, forall a. a is the Mogensen-Scott encoding of the bottom type. Hence, it has zero inhabitants. Consider the following algebraic data types:

    data Bottom                         -- forall a. a
    
    data Unit = Unit                    -- forall a. a -> a
    
    data Bool = False | True            -- forall a. a -> a -> a
    
    data Nat = Succ Nat | Zero          -- forall a. (a -> a) -> a -> a
    
    data List a = Cons a (List a) | Nil -- forall a b. (a -> b -> b) -> b -> b
    

    An algebraic data type is a sum of products. I'll use the syntax ⟦τ⟧ to denote the number of inhabitants of the type τ. There are two kinds of types I'll use in this article:

    1. System F data types, given by the following BNF:

      τ = α
        | τ -> τ
        | ∀ α. τ
      
    2. Algebraic data types, given by the following BNF:

      τ = 𝕟
        | α
        | τ + τ
        | τ * τ
        | μ α. τ
      

    Calculating the number of inhabitants of an algebraic data type is quite straightforward:

    ⟦𝕟⟧       = 𝕟
    ⟦τ¹ + τ²⟧ = ⟦τ¹⟧ + ⟦τ²⟧
    ⟦τ¹ * τ²⟧ = ⟦τ¹⟧ * ⟦τ²⟧
    ⟦μ α. τ⟧  = ⟦τ [μ α. τ / α]⟧
    

    For example, consider the list data type μ β. α * β + 1:

    ⟦μ β. α * β + 1⟧ = ⟦(α * β + 1) [μ β. α * β + 1 / β]⟧
                     = ⟦α * (μ β. α * β + 1) + 1⟧
                     = ⟦α * (μ β. α * β + 1)⟧ + ⟦1⟧
                     = ⟦α⟧ * ⟦μ β. α * β + 1⟧ + ⟦1⟧
                     = ⟦α⟧ * ⟦μ β. α * β + 1⟧ +  1
    

    However, calculating the number of inhabitants of a System F data type is not so straightforward. Nevertheless, it can be done. To do so, we need to convert the System F data type into an equivalent algebraic data type. For example, the System F data type ∀ α. ∀ β. (α -> β -> β) -> β -> β is equivalent to the algebraic list data type μ β. α * β + 1.

    The first thing to notice is that although the System F type ∀ α. ∀ β. (α -> β -> β) -> β -> β has two universal quantifiers yet the algebraic list data type μ β. α * β + 1 has only one (fixed point) quantifier (i.e. the algebraic list data type is monomorphic).

    Although we could make the algebraic list data type polymorphic (i.e. ∀ α. μ β. α * β + 1) and add the rule ⟦∀ α. τ⟧ = ∀ α. ⟦τ⟧ yet we don't do so because it unnecessarily complicates matters. We assume that the polymorphic type has been specialized to some monomorphic type.

    Thus, the first step is to drop all the universal quantifiers except for the one that represents the "fixed point" quantifier. For example, the type ∀ α. ∀ β. α -> β -> α becomes ∀ α. α -> β -> α.

    Most of the conversions are straightforward due to Mogensen-Scott encoding. For example:

    ∀ α. α                       = μ α. 0                   -- bottom type
    
    ∀ α. α -> α                  = μ α. 1 + 0               -- unit type
    
    ∀ α. α -> α -> α             = μ α. 1 + 1 + 0           -- boolean type
    
    ∀ α. (α -> α) -> α -> α      = μ α. (α * 1) + 1 + 0     -- natural number type
    
    ∀ β. (α -> β -> β) -> β -> β = μ β. (α * β * 1) + 1 + 0 -- list type
    

    However, some conversions are not so straightforward. For example, ∀ α. α -> β -> α doesn't represent a valid Mogensen-Scott encoded data type. Nevertheless, we can get the correct answer by juggling the types a bit:

    ⟦∀ α. α -> β -> α⟧ = ⟦β -> ∀ α. α -> α⟧
                       = ⟦∀ α. α -> α⟧ ^ ⟦β⟧ 
                       = ⟦μ α. 1 + 0⟧ ^ ⟦β⟧ 
                       = ⟦μ α. 1⟧ ^ ⟦β⟧ 
                       = ⟦1⟧ ^ ⟦β⟧ 
                       =  1 ^ ⟦β⟧
                       =  1
    

    For other types we need to use some trickery:

    ∀ α. (α, α) -> (α, α) = (∀ α. (α, α) -> α, ∀ α. (α, α) -> α)
                          = (∀ α. α -> α -> α, ∀ α. α -> α -> α)
    
    ⟦∀ α. α -> α -> α⟧ = ⟦μ α. 1 + 1 + 0⟧
                       = ⟦μ α. 2⟧
                       = ⟦2⟧
                       =  2
    
    ⟦∀ α. (α, α) -> (α, α)⟧ = ⟦∀ α. α -> α -> α⟧ * ⟦∀ α. α -> α -> α⟧
                            = 2 * 2
                            = 4
    

    Although there's a simple algorithm which will give us the number of inhabitants of a Mogensen-Scott encoded type yet I can't think of any general algorithm which will give us the number of inhabitants of any polymorphic type.

    In fact, I have a very strong gut feeling that calculating the number of inhabitants of any polymorphic type in general is an undecidable problem. Hence, I believe that there's no algorithm which will give us the number of inhabitants of any polymorphic type in general.

    Nevertheless, I believe that using Mogensen-Scott encoded types is a great start. Hope this helps.