Search code examples
coq

Stuck on a simple proof about regular expressions


I'm trying to formalize some properties on regular expressions (REs) using Coq. But, I've got some troubles to prove a rather simple property:

For all strings s, if s is in the language of (epsilon)* RE, then s = "", where epsilon and * denotes the empty string RE and Kleene star operation.

This seems to be an obvious application of induction / inversion tactics, but I couldn't make it work.

The minimal working code with the problematic lemma is in the following gist. Any tip on how should I proceed will be appreciated.

EDIT:

One of my tries was something like:

Lemma star_lemma : forall s, s <<- (#1 ^*) -> s = "".
Proof.  
  intros s H.
  inverts* H.
  inverts* H2.
  inverts* H1.
  inverts* H1.
  inverts* H2.
  simpl in *.
  -- stuck here

that leave me with the following goal:

s' : string
H4 : s' <<- (#1 ^*)
============================
s' = ""

At least to me, it appears that using induction would finish the proof, since I could use H4 in induction hypothesis to finish the proof, but when I start the proof using

induction H

instead of

inverts* H

I got some (at least for me) senseless goals. In Idris / Agda, such proof just follows by pattern matching and recursion over the structure of s <<- (#1 ^*). My point is how to do such recursion in Coq.


Solution

  • Here is one possible solution of the original problem:

    Lemma star_lemma : forall s,
        s <<- (#1 ^*) -> s = "".
    Proof.
      refine (fix star_lemma s prf {struct prf} : s = "" := _).
      inversion_clear prf; subst.
      inversion_clear H; subst.
      - now inversion H0.
      - inversion_clear H0; subst. inversion_clear H; subst.
        rewrite (star_lemma s' H1).
        reflexivity.
    Qed.
    

    The main idea is to introduce a term in the context which will resemble the recursive call in a typical Idris proof. The approaches with remember and dependent induction don't work well (without modifications of in_regex) because they introduce impossible to satisfy equations as induction hypotheses' premises.

    Note: it can take a while to check this lemma (around 40 seconds on my machine under Coq 8.5pl3). I think it's due to the fact that the inversion tactic tends to generate big proof terms.