Search code examples
haskellfunctional-programmingalgebraic-data-types

Why sum of products can be viewed as normal form in algebraic data types?


I am reading a haskell book (page 412). In this book, there is an explanation about normal form for sum of products:

All the existing algebraic rules for products and sums apply in type systems, and that includes the distributive property. Let’s take a look at how that works in arithmetic:

2 * (3 + 4)
2 * (7)
14

We can rewrite this with the multiplication distributed over the addition and obtain the same result:

2 * 3 + 2 * 4
(6) + (8)
14

This is known as a “sum of products.” In normal arithmetic, the expression is in normal form when it’s been reduced to a final result. However, if you think of the numerals in the above expressions as representations of set cardinality, then the sum of products expression is in normal form, as there is no computation to perform.

I've known that a normal form indiciates that an expression is fully reduced. In the above description, author of the book explains that sum of products can be seen as in normal form when we think of expression as representations of set cardinality. I don't understand that.

Cardinality of types means how many different values can be included in that type (like set). For example, Bool type in haskell has cardinality of 2, which is addition of 1 for False and 1 for True each.

Is sum of products (2 * 3 + 2 * 4) a normal form? This expression can be reduced furthre more because fully reduced expression would be 14. I don't understand why sum of products and cardinality of it are related to be normal form.


Solution

  • Let's declare ourselves some types to represent the numbers:

    data Two = OneOfTwo | TwoOfTwo
    data Three = OneOfThree | TwoOfThree | ThreeOfThree
    data Four = ... (similar)
    

    Now we can see that the number of possible values of type Two is, in fact, 2. Same for Three, Four, and Seven.

    Now if we construct a sum type:

    data A = A Two
    

    This type just straight up wraps a value of Two, so the number of possible values of A is also 2. So far so good?

    Now let's construct a more complex one:

    data B = B1 Three | B2 Four
    

    Now, this type wraps either a value of type Three or a value of type Four (but not both at the same time!) This means that the number of possible values would be 3 + 4. Following so far?

    Now, going further:

    data C = C Two B
    

    This type wraps two values at the same time - one value of type Two and one value of type B. This means that the number of possible values of C is the number of possible combinations of Two and B, which, as we know from middle-school mathematics, would be their product, or 2 * (3 + 4) = 2 * (7) = 14.

    But here's the trick: we can write down an equivalent type in a different way:

    data CNew = C1 Two Three | C2 Two Four
    

    See what I did there? For CNew, the set of all possible combinations between values of Two, Three, and Four is the same as for C. Look: in both cases it's either a value of Two combined with a value of Three, or it's a value of Two combined with a value of Four. Except in CNew they're combined directly, but in C they're combined via B.

    But the formula for CNew would be different: 2 * 3 + 2 * 4 = (6) + (8) = 14. This is what the book means.

    Now to answer this bit more directly:

    Is sum of products (2 * 3 + 2 * 4) a normal form? This expression can be reduced further more because fully reduced expression would be 14

    This would be true if we were dealing with integer numbers, but we're not. We can rewrite C in the form of CNew, because that gives us all the same possible combinations of values. But we cannot rewrite them as a type that has straight up 14 possible values without combining 2, 3, and 4. That would be a completely new, unrelated type, as opposed to a combination of Two, Three, and Four.

    And a possible terminology misunderstanding:

    Is sum of products (2 * 3 + 2 * 4) a normal form?

    The term "normal form" doesn't mean "the shortest". This term is usually used to denote a form that is very regular, and therefore easier to work with, and, crucially, that can represent all possible cases in the domain. In this case, normal form is defined as a "sum of products".

    Could it be a "product of sums" instead? No, it couldn't, because, while a product of sums can always be converted to a sum of products, the reverse is not always possible, and this means that not every possible type would be representable in the normal form defined as "product of sums".

    Could it be "just the number of possible values", like 14? Again no, because converting to such form loses some information (see above).