How to prove remove_copy from ACSL by example

I tried to prove the algorithm remove copy (the first version) from "ACSL by Example" version 11.1.0.

I used Alt-Ergo (0.99.1), CVC3 (2.4.1), Z3 (4.3.2), CVC4 (1.4) and Why3 (0.85) The time limit in why3 was 50 sec and to start frama-c, I used the command:

frama-c-gui -wp -wp-model Typed+ref -wp-rte -wp-split remove_copy_11.c

Only one proof obligation was not solved (timeout).

Its body is:

Goal Preservation of Invariant 'kept' (file remove_copy_11.c, line 73) (1/2):
Tags: Then Then.
Let x_0 = (L_Count Mint_2 a_0 i_1 v_0).
Let x_1 = -x_0.
Let x_2 = i_1-x_0.
Let a_1 = (shift_sint32 b_0 x_2).
Let a_2 = (shift_sint32 a_0 i_1).
Let x_3 = Mint_2[a_2].
Let x_4 = 1+i_1.
Let x_5 = 1+i_1-x_0.
Let a_3 = (shift_sint32 b_0 0).
Let a_4 = (shift_sint32 a_0 0).
Assume {
  Type: (is_sint32 v_0) /\ (is_uint32 i_1) /\ (is_uint32 n_0)
        /\ (is_uint32 x_4) /\ (is_sint32 x_3) /\ (is_uint32 x_2)
        /\ (is_uint32 x_5).
  Have: (linked Malloc_0) /\ ((region (base a_0))<=0)
        /\ ((region (base b_0))<=0).
  Have: (valid_rd Malloc_0 a_4 n_0).
  Have: (valid_rw Malloc_0 a_3 n_0).
  Have: (separated a_4 n_0 a_3 n_0).
  Have: (havoc Mint_1 Mint_2 a_3 n_0).
  Have: (P_Unchanged Mint_1 Mint_2 b_0 x_2 n_0).
  Have: (P_PreserveCount Mint_2 a_0 i_1 b_0 x_2 v_0).
  Have: not (P_HasValue Mint_2 b_0 x_2 v_0).
  Have: (i_1<=n_0) /\ (0<=x_0) /\ (x_0<=i_1).
  Have: i_1<n_0.
  Have: (P_EqualRanges_2_ Mint_1 Mint_2 a_0 n_0).
  Have: (valid_rd Malloc_0 a_2 1).
  Have: v_0!=x_3.
  Have: i_1<=(4294967294+x_0).
  Have: (valid_rw Malloc_0 a_1 1).
  Have: i_1<=4294967294.
}
Prove: (P_PreserveCount Mint_2[a_1->x_3] a_0 x_4 b_0 x_5 v_0).
e

Since the document indicates that two proofs of this algorithm were validated using Coq, I guess this was one of them. I'm beginner with Coq, so my question is, how this proof obligation was proved with Coq, if it is the case.

Solution

This question is already a bit old, but I hope this answer will still be useful.

From Frama-C, you can select a goal and convert it in a format that Coq understands.

You should get something like this:

(* ---------------------------------------------------------- *)
(* --- Preservation of Invariant 'kept' (file remove_copy.c, line 95) (1/2) --- *)
(* ---------------------------------------------------------- *)
Require Import ZArith.
Require Import Reals.
Require Import BuiltIn.
Require Import bool.Bool.
Require Import int.Int.
Require Import int.Abs.
Require Import int.ComputerDivision.
Require Import real.Real.
Require Import real.RealInfix.
Require Import real.FromInt.
Require Import map.Map.
Require Import Qedlib.
Require Import Qed.
Require Import Memory.
Require Import Cint.

Require Import Compound.
Require Import Axiomatic.
Require Import A_CountAxiomatic.
Require Import Axiomatic1.

Goal
  forall (i_2 i_1 i : Z),
  forall (t : array Z),
  forall (t_2 t_1 : farray addr Z),
  forall (a_1 a : addr),
  let a_2 := (shift_sint32 a_1 i%Z) in
  let x := (t_1.[ a_2 ])%Z in
  let x_1 := (1%Z + i%Z)%Z in
  let x_2 := ((L_Count t_1 a_1 i%Z i_1%Z))%Z in
  let a_3 := (shift_sint32 a_1 0%Z) in
  let a_4 := (shift_sint32 a 0%Z) in
  let x_3 := (-x_2)%Z in
  let x_4 := (i%Z - x_2)%Z in
  let x_5 := (1%Z + i%Z - x_2)%Z in
  let a_5 := (shift_sint32 a x_4) in
  ((i < i_2)%Z) ->
  ((i <= 4294967294)%Z) ->
  ((i <= i_2)%Z) ->
  ((linked t)) ->
  ((is_sint32 i_1%Z)) ->
  ((is_uint32 i%Z)) ->
  ((is_uint32 i_2%Z)) ->
  ((i_1 <> x)%Z)%Z ->
  ((((region ((base a))%Z)) <= 0)%Z) ->
  ((((region ((base a_1))%Z)) <= 0)%Z) ->
  ((is_uint32 x_1)) ->
  ((P_EqualRanges t_2 t_1 a_1 i_2%Z)) ->
  ((0 <= x_2)%Z) ->
  ((x_2 <= i)%Z) ->
  ((is_sint32 x)) ->
  ((valid_rd t a_3 i_2%Z)) ->
  ((valid_rd t a_2 1%Z)) ->
  ((valid_rw t a_4 i_2%Z)) ->
  ((separated a_3 i_2%Z a_4 i_2%Z)) ->
  ((havoc t_2 t_1 a_4 i_2%Z)) ->
  ((i <= (4294967294 + x_2))%Z) ->
  ((is_uint32 x_4)) ->
  ((is_uint32 x_5)) ->
  ((P_Unchanged t_2 t_1 a x_4 i_2%Z)) ->
  (~((P_HasValue t_1 a x_4 i_1%Z))) ->
  ((valid_rw t a_5 1%Z)) ->
  ((P_PreserveCount t_1 a_1 i%Z a x_4 i_1%Z)) ->
  ((P_PreserveCount (t_1.[ a_5 <- x ]) a_1 x_1 a x_5 i_1%Z)).

There are three difficulties here:

The translation renames all variables, therefore the Coq version is more difficult to understand than the Frama-C one, and you often have to compare the hypotheses to understand what a variable in Coq represents
A lot of hypotheses are added automatically and are useless to prove this particular goal
You must be a bit familiar with the representation of the memory by Coq (shift_sint32, separated, havoc, ...)

Here is a proof that works, but I have no idea if it is a good proof. I tried to explain the main ideas in the comments.

Proof.
  (* Some preliminary steps in order to clean the hypotheses and the goal *)
  intros.
  clear H0 H1 H2 H3 H4 H5 H7 H8 H9 H10 H13 H14 H15 H16 H18 H19 H20 H21 H22
    H23 H24 t_2 x_3.
  assert(a_4=a) as Ha1.
  { unfold a_4. unfold shift_sint32. unfold shift.
    replace (offset a + 0) with (offset a) by omega.
    destruct a; reflexivity.
  }
  assert(a_3=a_1) as Ha2.
  { unfold a_3. unfold shift_sint32. unfold shift.
    replace (offset a_1 + 0) with (offset a_1) by omega.
    destruct a_1; reflexivity.
  }
  rewrite Ha1, Ha2 in *. clear dependent a_3. clear dependent a_4.
  unfold x_1, x_5. replace (1+i-x_2) with (1+(i-x_2)) by omega.
  remember (i-x_2) as j.
  unfold x_4 in *.

  (* Here we see what the goal of this proof really means: we have
     the property P_PreserveCount for the indices i and j (hypothesis H25),
     and we want to prove that it is preserved after one iteration
     of the loop for the indices (i+1) and (j+1). *)

  (* The idea is to prove the following intermediary result:
     after the iteration, P_PreserveCount holds for i and j.
     This is actually trivial, since the arrays a and b (named here a_1 and a)
     are not modified until i and j. *)
  assert (P_PreserveCount (t_1 .[ a_5 <- x]) a_1 i a j i_1) as Ha.
  { unfold P_PreserveCount; intros.
    rewrite 2!Q_CountRead with (t:=t_1 .[a_5 <- x]) (t_1:=t_1); try assumption.
    apply H25; try assumption.
    - unfold P_EqualRanges. intros.
      rewrite access_update_neq. reflexivity.
      unfold a_5, shift_sint32, shift.
      injection; intros. omega.
    - unfold P_EqualRanges. intros.
      rewrite access_update_neq. reflexivity.
      unfold a_5, shift_sint32, shift.
      apply separated_1 with (a:=i_2) (b:=i_2).
      apply separated_sym. assumption. omega. omega.
  }

  (* a[i] and b[j] are equal (to x) *)
  assert((t_1 .[ a_5 <- x]) .[ shift_sint32 a j] = x) as Ha1.
  { apply access_update. }
  assert((t_1 .[ a_5 <- x]) .[ shift_sint32 a_1 i] = x) as Ha2.
  { rewrite access_update_neq. reflexivity.
    unfold a_5, shift_sint32, shift.
    apply separated_1 with (a:=i_2) (b:=i_2).
    apply separated_sym. assumption. omega. omega.
  }

  (* here, we know that P_PreserveCount holds for the indices i and j,
     and that a[i]=b[j], so the conclusion is easy. We just have to distinguish
     the different cases *)
  unfold P_PreserveCount; intros.
  destruct(why_decidable_eq i0 x).
  - subst i0.
    rewrite 2!Q_CountOneHit; try assumption.
    rewrite Ha by assumption. reflexivity.
    rewrite Ha1; reflexivity. rewrite Ha2; reflexivity. 
  - rewrite <- 2!Q_CountOneMiss; try assumption.
    apply Ha; assumption.
    rewrite Ha1. assumption. rewrite Ha2. omega.
Qed.

To conclude, an advantage of doing the proof in Coq is understanding what the SMT solvers lack to conclude automatically. After several tries, I found that replacing the loop by:

for (size_type i = 0; i < n; ++i)
{
  //@ assert EqualRanges{Here,Pre}(a, n);
  L:if (a[i] != v)
  {
    b[j++] = a[i];
  }
  //@ assert EqualRanges{Here,L}(a, i);
  //@ assert EqualRanges{Here,L}(b, \at(j, L));
  //@ assert PreserveCount(a, i, b, \at(j, L), v);
}

allows the provers of why3 to solve everything.