Non-trivial Fixpoint on nested recursive types

Given two standard definitions of binary trees and arbitrarily branching trees (K-trees) in Coq (nat hardcoded for simplicity):

Inductive bintree : Type :=
  | Leaf : bintree
  | Node : bintree -> nat -> bintree -> bintree.

Inductive tree :=
  | TNode : nat -> list tree -> tree.

I'd like to define a function encode : tree -> bintree that converts a K-tree to a binary tree with the same nodes, with the ultimate goal being a definition of decode : bintree -> tree such that forall t, decode (encode t) = t and a proof of this and some other minor properties. A simple idea goes as follows: we encode a tree of node x and children ts as a binary tree where left subtree is encoded tree rooted at its leftmost child, and right subtree is encoded tree rooted at immediate neighbor of x to the right. In Haskell:

data Tree a = TNode a [Tree a]
data BinTree a = Leaf | BNode (BinTree a) a (BinTree a)

encode :: Tree a -> BinTree a
encode t = go [t]
 where
  go [] = Leaf
  go ((TNode x ts) : ts') = BNode (go ts) x (go ts')

An analogous definition in Coq is ill-formed:

Fixpoint encode_list (l: list ntree) :=
  match l with
    | [] => Leaf
    | (TNode x ts) :: ts' => Node (encode_list ts) x (encode_list ts')
  end.

Error: Recursive definition of encode_list is ill-formed.
(* ... *)
Recursive call to encode_list has principal argument equal to "ts" instead of "ts'".
Recursive definition is:
"fun l : list tree =>
 match l with
 | [] => Leaf
 | t :: ts' =>
     match t with
     | TNode x ts => Node (encode_list ts) x (encode_list ts')
     end
 end".

It's easy to tell that both ts and ts' are strict subterms of l, but Coq's heuristics are to weak to assert that. I tried to apply a similar trick to the one described here, but encoding a tree is done in the context of a node's right sibling, so it's difficult (impossible?) to write a function that given a node and its sibling(s) produces a binary tree but recurses structurally down the structure of only one of its arguments, i.e. this definition is invalid because we iterate down both arguments:

(* ns : right siblings of t *)
Fixpoint encode' (t: tree) (ns: list tree) :=
  let '(TNode x cs) := t in
    match cs, ns with
      | [], [] => Node Leaf x Leaf
      | [], n :: ns' => Node Leaf x (encode' n ns')
      | c :: cs', [] => Node (encode' c cs') x Leaf
      | c :: cs', n :: ns' => Node (encode' c cs') x (encode' n ns')
    end.

(* Error: Cannot guess decreasing argument of fix. *)
(* Definition encode (t: tree) = encode' t []. *)

I eventually managed to get my first definition admitted via Program Fixpoint with measure being total number of nodes in a list of trees. But I wonder if this can be achieved without resorting to well-founded recursion? One would think that we should be able to apply the trick with nested fix function in this situation, but I haven't succeeded at doing so. Full snippet below:

Require Coq.Program.Wf.
Require Import Lia.

Require Import List.
Import ListNotations.

Set Implicit Arguments.

Inductive bintree (A: Type) : Type :=
  | Leaf : bintree A
  | Node : bintree A -> A -> bintree A -> bintree A.

Inductive tree (A: Type) :=
  | TNode : A -> list (tree A) -> tree A.

Definition ntree := tree nat.

Arguments Leaf {A}.
Arguments TNode [A].

Fixpoint n_nodes (t: ntree) :=
  let '(TNode n ns) := t in
  let fix n_nodes_list (l: list ntree) :=
    match l with
      | [] => 0
      | x :: xs => n_nodes x + n_nodes_list xs
    end
  in 1 + n_nodes_list ns.

Fixpoint sum_n_nodes (l: list ntree) :=
  match l with
    | [] => 0
    | t :: ts => n_nodes t + sum_n_nodes ts
  end.

Program Fixpoint encode_list (l: list ntree) {measure (sum_n_nodes l)} :=
  match l with
    | [] => Leaf
    | (TNode x ts) :: ts' => Node (encode_list ts') x (encode_list ts)
  end.

Next Obligation.
destruct ts'; simpl; lia.
Defined.

Next Obligation.
destruct ts.
simpl; lia.
simpl sum_n_nodes at 1.
replace (sum_n_nodes (TNode x (t :: ts) :: ts')) with (1 + n_nodes t + sum_n_nodes ts + sum_n_nodes ts') at 1 by reflexivity.
lia.
Defined.

Definition encode (l: ntree) := encode_list [l].

Solution

You can't define encode using only a recursive function with a decreasing argument of type list tree, because it cannot recurse on the head. Instead you must have a function with decreasing argument of type tree, which can call itself on the head of the list.

When we say "recursive calls must be made on subterms of the decreasing argument", the exact definition of "subterm" is somewhat sophisticated and counterintuitive. It's not as naive as "t is a subterm of C t, where C is a constructor". There is this odd situation where t is a subterm of TNode n (t :: ts), but not of t :: ts (somehow because list does not occur in the first argument of the constructor _ :: _ in the definition of list). I faintly remember there is a good reason for that subtlety but I can't recall it off the top of my head.

You must somehow turn the recursive call of encode_list on (the contents of) the head of the list into a recursive call of encode. You will also need a nested recursive function to loop through the list, so encode_list will still remain.

Unfold what encode should be in the interesting case, a rose tree with a nonempty forest TNode n (t :: ts). The goal is to rewrite the right-hand side in terms of encode t. (In general, you may have to find a more sophisticated auxiliary function, and get the function you want as a special case of it, but in this case it luckily works simply enough.)

encode (TNode n (t :: ts)) = Node (encode_list (t :: ts)) n Leaf

encode t is encode_list (t :: []), which looks similar but not identical to encode_list (t :: ts).

encode_list (t :: []) = Node _ _ Leaf
encode_list (t :: ts) = Node _ _ (encode_list ts)

where the underscores are some functions of t that I've omitted for brevity. We can rewrite encode_list (t :: ts) in terms of encode t by replacing its right subtree with encode_list ts, using an auxiliary append function below:

encode_list (t :: ts) = append (encode t) (encode_list ts)

where

append (Node l n r) rr = Node l n rr

The Leaf case of append is left unspecified because it will never happen in the execution of encode, so we can fill it out arbitrarily:

append Leaf rr = Leaf

In Coq:

Definition append (t rr : bintree) : bintree :=
  match t with
  | Leaf => Leaf (* should not happen *)
  | Node l n _ => Node l n rr
  end.

Fixpoint encode (t : tree) : bintree :=
  let fix encode_list ts' :=
    match ts' with
    | [] => Leaf
    | t' :: ts' => append (encode t') (encode_list ts')
    end in
  match t with
  | TNode n ts => Node (encode_list ts) n Leaf
  end.

One technical aspect of that definition is that encode_list is "nested" in encode (fix inside a Fixpoint/fix), instead of defined mutually with encode (fix ... with ...). This is not obvious from the "Haskelly" presentation above and relies on more technical details on the definition of "subterm" that the Coq typechecker uses.

Alternatively, you can avoid the useless and inelegant branch in append by defining the composition of encode and append first, and obtain encode as a special case. Indeed, in the previous definition of encode, its recursive calls are all passed to append, and we know the right subtree is always a Leaf.

(* append_encode t w = append (encode t) w *)
Fixpoint append_encode (t : tree) (w : bintree) : bintree :=
  let fix encode_list ts' :=
    match ts' with
    | [] => Leaf
    | t' :: ts' => append_encode t' (encode_list ts')
    end in
  match t with
  | TNode n ts => Node (encode_list ts) n w
  end.

Definition encode (t : tree) : bintree := append_encode t Leaf.

Full gist

Note that there is no tree that can be encoded to a Leaf so decode must be partial, or you'll have to reconsider the problem statement somehow.