Given two standard definitions of binary trees and arbitrarily branching trees (K-trees) in Coq (nat
hardcoded for simplicity):
Inductive bintree : Type :=
| Leaf : bintree
| Node : bintree -> nat -> bintree -> bintree.
Inductive tree :=
| TNode : nat -> list tree -> tree.
I'd like to define a function encode : tree -> bintree
that converts a K-tree to a binary tree with the same nodes, with the ultimate goal being a definition of decode : bintree -> tree
such that forall t, decode (encode t) = t
and a proof of this and some other minor properties. A simple idea goes as follows: we encode a tree of node x
and children ts
as a binary tree where left subtree is encoded tree rooted at its leftmost child, and right subtree is encoded tree rooted at immediate neighbor of x
to the right. In Haskell:
data Tree a = TNode a [Tree a]
data BinTree a = Leaf | BNode (BinTree a) a (BinTree a)
encode :: Tree a -> BinTree a
encode t = go [t]
where
go [] = Leaf
go ((TNode x ts) : ts') = BNode (go ts) x (go ts')
An analogous definition in Coq is ill-formed:
Fixpoint encode_list (l: list ntree) :=
match l with
| [] => Leaf
| (TNode x ts) :: ts' => Node (encode_list ts) x (encode_list ts')
end.
Error: Recursive definition of encode_list is ill-formed.
(* ... *)
Recursive call to encode_list has principal argument equal to "ts" instead of "ts'".
Recursive definition is:
"fun l : list tree =>
match l with
| [] => Leaf
| t :: ts' =>
match t with
| TNode x ts => Node (encode_list ts) x (encode_list ts')
end
end".
It's easy to tell that both ts
and ts'
are strict subterms of l
, but Coq's heuristics are to weak to assert that. I tried to apply a similar trick to the one described here, but encoding a tree is done in the context of a node's right sibling, so it's difficult (impossible?) to write a function that given a node and its sibling(s) produces a binary tree but recurses structurally down the structure of only one of its arguments, i.e. this definition is invalid because we iterate down both arguments:
(* ns : right siblings of t *)
Fixpoint encode' (t: tree) (ns: list tree) :=
let '(TNode x cs) := t in
match cs, ns with
| [], [] => Node Leaf x Leaf
| [], n :: ns' => Node Leaf x (encode' n ns')
| c :: cs', [] => Node (encode' c cs') x Leaf
| c :: cs', n :: ns' => Node (encode' c cs') x (encode' n ns')
end.
(* Error: Cannot guess decreasing argument of fix. *)
(* Definition encode (t: tree) = encode' t []. *)
I eventually managed to get my first definition admitted via Program Fixpoint
with measure
being total number of nodes in a list of trees. But I wonder if this can be achieved without resorting to well-founded recursion? One would think that we should be able to apply the trick with nested fix
function in this situation, but I haven't succeeded at doing so. Full snippet below:
Require Coq.Program.Wf.
Require Import Lia.
Require Import List.
Import ListNotations.
Set Implicit Arguments.
Inductive bintree (A: Type) : Type :=
| Leaf : bintree A
| Node : bintree A -> A -> bintree A -> bintree A.
Inductive tree (A: Type) :=
| TNode : A -> list (tree A) -> tree A.
Definition ntree := tree nat.
Arguments Leaf {A}.
Arguments TNode [A].
Fixpoint n_nodes (t: ntree) :=
let '(TNode n ns) := t in
let fix n_nodes_list (l: list ntree) :=
match l with
| [] => 0
| x :: xs => n_nodes x + n_nodes_list xs
end
in 1 + n_nodes_list ns.
Fixpoint sum_n_nodes (l: list ntree) :=
match l with
| [] => 0
| t :: ts => n_nodes t + sum_n_nodes ts
end.
Program Fixpoint encode_list (l: list ntree) {measure (sum_n_nodes l)} :=
match l with
| [] => Leaf
| (TNode x ts) :: ts' => Node (encode_list ts') x (encode_list ts)
end.
Next Obligation.
destruct ts'; simpl; lia.
Defined.
Next Obligation.
destruct ts.
simpl; lia.
simpl sum_n_nodes at 1.
replace (sum_n_nodes (TNode x (t :: ts) :: ts')) with (1 + n_nodes t + sum_n_nodes ts + sum_n_nodes ts') at 1 by reflexivity.
lia.
Defined.
Definition encode (l: ntree) := encode_list [l].
You can't define encode
using only a recursive function with a decreasing argument of type list tree
, because it cannot recurse on the head. Instead you must have a function with decreasing argument of type tree
, which can call itself on the head of the list.
When we say "recursive calls must be made on subterms of the decreasing argument", the exact definition of "subterm" is somewhat sophisticated and counterintuitive. It's not as naive as "t is a subterm of C t, where C is a constructor". There is this odd situation where t
is a subterm of TNode n (t :: ts)
, but not of t :: ts
(somehow because list
does not occur in the first argument of the constructor _ :: _
in the definition of list
).
I faintly remember there is a good reason for that subtlety but I can't recall it off the top of my head.
You must somehow turn the recursive call of encode_list
on (the contents of) the head of the list into a recursive call of encode
. You will also need a nested recursive function to loop through the list, so encode_list
will still remain.
Unfold what encode
should be in the interesting case, a rose tree with a nonempty forest TNode n (t :: ts)
. The goal is to rewrite the right-hand side in terms of encode t
. (In general, you may have to find a more sophisticated auxiliary function, and get the function you want as a special case of it, but in this case it luckily works simply enough.)
encode (TNode n (t :: ts)) = Node (encode_list (t :: ts)) n Leaf
encode t
is encode_list (t :: [])
, which looks similar but not identical to encode_list (t :: ts)
.
encode_list (t :: []) = Node _ _ Leaf
encode_list (t :: ts) = Node _ _ (encode_list ts)
where the underscores are some functions of t
that I've omitted for brevity.
We can rewrite encode_list (t :: ts)
in terms of encode t
by replacing its right subtree with encode_list ts
, using an auxiliary append
function below:
encode_list (t :: ts) = append (encode t) (encode_list ts)
where
append (Node l n r) rr = Node l n rr
The Leaf
case of append
is left unspecified because it will never happen in the execution of encode
, so we can fill it out arbitrarily:
append Leaf rr = Leaf
In Coq:
Definition append (t rr : bintree) : bintree :=
match t with
| Leaf => Leaf (* should not happen *)
| Node l n _ => Node l n rr
end.
Fixpoint encode (t : tree) : bintree :=
let fix encode_list ts' :=
match ts' with
| [] => Leaf
| t' :: ts' => append (encode t') (encode_list ts')
end in
match t with
| TNode n ts => Node (encode_list ts) n Leaf
end.
One technical aspect of that definition is that encode_list
is "nested" in encode
(fix
inside a Fixpoint
/fix
), instead of defined mutually with encode
(fix ... with ...
). This is not obvious from the "Haskelly" presentation above and relies on more technical details on the definition of "subterm" that the Coq typechecker uses.
Alternatively, you can avoid the useless and inelegant branch in append
by defining the composition of encode
and append
first, and obtain encode
as a special case. Indeed, in the previous definition of encode
, its recursive calls are all passed to append
, and we know the right subtree is always a Leaf
.
(* append_encode t w = append (encode t) w *)
Fixpoint append_encode (t : tree) (w : bintree) : bintree :=
let fix encode_list ts' :=
match ts' with
| [] => Leaf
| t' :: ts' => append_encode t' (encode_list ts')
end in
match t with
| TNode n ts => Node (encode_list ts) n w
end.
Definition encode (t : tree) : bintree := append_encode t Leaf.
Note that there is no tree
that can be encoded to a Leaf
so decode
must be partial, or you'll have to reconsider the problem statement somehow.