Datatype transformation function takes very long to proof

Isabelle needs much time for proving the correctness of (in my eyes) rather simple datatype transformation functions. As example I have created datatypes to represent mathematical and boolean expressions and a function which simplifies such an expression.

datatype 'a math_expr =
  Num int |
  Add "'a math_expr" "'a math_expr" |
  Mul "'a math_expr" "'a math_expr" |
  Sub "'a math_expr" "'a math_expr" |
  Div "'a math_expr" "'a math_expr"

datatype 'a expr =
  True |
  False |
  And "'a expr" "'a expr" |
  Or "'a expr" "'a expr" |
  Eq "'a math_expr" "'a math_expr" |
  Ne "'a math_expr" "'a math_expr" |
  Lt "'a math_expr" "'a math_expr" |
  Le "'a math_expr" "'a math_expr" |
  Gt "'a math_expr" "'a math_expr" |
  Ge "'a math_expr" "'a math_expr" |
  If "'a expr" "'a expr" "'a expr"

function (sequential) simplify :: "'a expr ⇒ 'a expr" where
"simplify (And a True) = a" |
"simplify (And True b) = b" |
"simplify (Or a True) = True" |
"simplify (Or True b) = True" |
"simplify e = e"
by pat_completeness auto
termination by lexicographic_order

On my notebook Isabelle takes quite some time to proof the function (signature and body highlighted) and even much more time to proof its completeness (by pat_completeness auto highlighted). The needed computation time highly depends on the complexity of the expr datatype and the number of pattern matching rules in simplify. The more constructors in the datatype and the more pattern matching rules, the longer it takes.

What is the reason for this behavior? Is there a way to make such a function easier provable?

Solution

The sequential option causes the function command to specialize overlapping equations such that they do not overlap any more. However, this is only a preprocessor to the actual internal construction, which actually supports overlapping patterns (provided that one can prove that the right hand sides denote the same HOL value for overlapping instances, i.e., they are consistent). This consistency proof is expressed as a individual goals (which auto solves essentially always if the sequential option is used because it suffices to prove that they cannot overlap). Yet, there are quadratically many goals in the number of disambiguated equations. So if you add more constructors, the overlapping equations will be split into more cases and those cases translate into a quadratic number of goals.

There are two workarounds when the function is not recursive:

For non-recursive functions, I recommend to use definition with a case expression on the right. Then, you can use simps_of_case from HOL-Library.Simps_Case_Conv to obtain the simp rules. You don't get a nice case distinction rule, though.

definition simplify :: "'a expr ⇒ 'a expr" where
  "simplify e = (case e of And a True => a | And True b => b | ... | _ => e)"

simps_of_case simplify_simps [simp]: simplify_def

If you want to have nice case distinction theorems, you can split the function definition into several helper functions:

fun simplify_add :: "'a expr => 'a expr => 'a expr" where
  "simplify_add a True = a"
| "simplify_add True b = b"
| "simplify_add a b = Add a b"

fun simplify_or (* similarly *)

fun simplify :: "'a expr => 'a expr" where
  "simplify (And a b) = simplify_and a b"
| "simplify (Or a b) = simplify_or a b"
| "simplify e = e"

For recursive functions, you can avoid the explosion by moving some of the case distinctions to the right-hand side. For example:

fun simplify :: "'a expr ⇒ 'a expr" where
  "simplify (And a b) = (case b of True => a | _ => case a of True => b | _ => And a b)"
| ...

Again, this greatly reduces the number of equations after making them non-overlapping, but you don't get the same case distinction rules (and induction rules) any more.