Search code examples
covarianceassertionprooftheorem-provingdafny

Cannot prove in Dafny that variance(x)>=0


I would like to prove in Dafny that variance(x) (i.e., covariance(x,x)) is >=0. Note that we define covariance(x,y) as: 1/N*Summation{i=1 to N}{(x_i-x_mean)*(y_i-y_mean)}, where N is the number of elements of x and y.

Let us define covariance in Dafny as in Proving a covariance inequality in Dafny, use contradiction?:

//calculates the sum of all elements of a sequence
function method sum_seq(s:seq<real>): (res:real)
  ensures (forall i :: 0 <= i < |s| ==> s[i] >= 0.0) ==> res>=0.0
  decreases s;
{
  if s == [] then 0.0 else s[0] + sum_seq(s[1..])
}

//calculates the mean of a sequence
function method mean_fun(s:seq<real>): (res:real) 
  requires |s| >= 1;
  decreases |s|;
  ensures (forall i :: 0 <= i < |s| ==> s[i] >= 0.0) ==> res>=0.0

{
    sum_seq(s) / (|s| as real)
}
   
//from a sequence x, it constructs a=[x[0]-x_mean, x[1]-x_mean...]
function construct_list (x:seq<real>, m:real) : (a:seq<real>)
  requires |x| >= 1
  ensures |x|==|a|
  {
    if |x| == 1 then [x[0]-m]
    else [x[0]-m] + construct_list(x[1..],m)
  }

//it performs the Summation of the covariance
//note that a=[x[0]-x_mean, x[1]-x_mean...] and b=[y[0]-y_mean, y[1]-y_mean...]
function {:fuel 2} product(a: seq<real>, b: seq<real>) : real
  requires |a| == |b|
{
  if |a| == 0 then 0.0
  else a[0] * b[0] + product(a[1..], b[1..])
}

//covariance is the Summation divided by the number of elements
function cov(x: seq<real>, y: seq<real>) : (res:real)
  requires |x| == |y|
  requires |x| >= 1
  ensures cov(x,y) == product(construct_list(x, mean_fun(x)),construct_list(y, mean_fun(y))) / (|x| as real)
  //i.e., ensures cov(x,y) == product(a,b) / (|x| as real)
{
  var x_mean := mean_fun(x);
  var y_mean := mean_fun(y);

  var a := construct_list(x, x_mean);
  var b := construct_list(y, y_mean);

  product(a,b) / (|a| as real)
} 

Thus, to prove cov(x,y)>=0, I directly try the following:

//variance (i.e., Cov(x,x)) is always positive
lemma covarianceItself_positive(a:seq<real>) 
   requires |a| >= 1
   requires forall i :: 0 <= i < |a| ==> a[i] >= 0.0
   ensures cov(a,a) >= 0.0
   {}

Which does not verify on its own. So I verify its base case and then realize that is suffices to prove that product(a,a)>=0.0, which is stated in Lemma productItself_positive(a):

lemma covarianceItself_positive(a:seq<real>) 
   requires |a| >= 1
   requires forall i :: 0 <= i < |a| ==> a[i] >= 0.0
   ensures cov(a,a) >= 0.0
   {
       if (|a|==1){
           assert cov(a,a) >= 0.0;
       }
       else{
           productItself_positive(a);
       }
   }

Where productItself_positive is the following:

lemma productItself_positive(a:seq<real>)
   requires |a| >= 1
   requires forall i :: 0 <= i < |a| ==> a[i] >= 0.0
   ensures product(construct_list(a, mean_fun(a)),construct_list(a, mean_fun(a))) >= 0.0  
   {} 

I am trying to make a proof for this Lemma, so I started a calculation. My problems are that (1) Dafny reports that a calculation step might not hold; and (2) Dafny is easily getting stuck while verifying, so I do not know whether I am following the a right idea. My advances so far are as follows:

lemma productItself_positive(a:seq<real>)
   requires |a| >= 1
   requires forall i :: 0 <= i < |a| ==> a[i] >= 0.0
   ensures product(construct_list(a, mean_fun(a)),construct_list(a, mean_fun(a))) >= 0.0  
   {
       if (|a|==1){
           assert product(construct_list(a, mean_fun(a)),construct_list(a, mean_fun(a))) >= 0.0;
       } 
       else {
           calc >= {
               product(construct_list(a, mean_fun(a)),construct_list(a, mean_fun(a)));
               {
                 assert forall x:real :: forall y:real :: (x>=0.0 && y>=0.0) ==> (x*y>=0.0);
                 assert forall x:real :: forall y:real :: (x>=0.0 && y>=0.0) ==> (x+y>=0.0);

                 assert construct_list(a, mean_fun(a))[0] * construct_list(a, mean_fun(a))[0] >= 0.0;
                 productItself_positive(a[1..]);
                 //assert product(a[1..],a[1..]) >= 0.0; //Loops forever

               }
               //construct_list(a, mean_fun(a))[0] * construct_list(a, mean_fun(a))[0] + product(a[1..],a[1..]); //This should hold, but says that previous calculation does not hold
               //{assume construct_list(a, mean_fun(a))[0] * construct_list(a, mean_fun(a))[0] + product(a[1..],a[1..]) >= 0.0;}
               //construct_list(a, mean_fun(a))[0] * construct_list(a, mean_fun(a))[0] + product(a[1..],a[1..]) >= 0.0;
               //Knowing that construct_list(a, mean_fun(a))[0] * construct_list(a, mean_fun(a))[0] >= 0.0 and that product(a[1..],a[1..]), their sum should be >=0.0
               0.0;
           }
       } 
   } 

Any help?


Solution

  • There is straight forward way to do this. First add postcondition in product that it is positive if both array are same, dafny able to verify it without any help

    function {:fuel 2} product(a: seq<real>, b: seq<real>) : real
      requires |a| == |b|
      ensures a == b ==> product(a, b) >= 0.0
    {
      if |a| == 0 then 0.0
      else a[0] * b[0] + product(a[1..], b[1..])
    }
    

    Then use the fact that covariance is product of two equal array

    lemma self_cov_is_positive(a: seq<real>)
     requires |a| >= 1
     ensures cov(a, a) >= 0.0
     {
        var a_list := construct_list(a, mean_fun(a));
        assert cov(a, a) == product(a_list, a_list) / (|a| as real);
     }