Search code examples
coqfinite-automatalanguage-theory

Defining a finite automata Coq


I am learning Coq and I'd like to use it to formalize Regular languages theory, specially finite automata. Let's say I have a structure for an automata as follows:

Record automata : Type := {
dfa_set_states : list state;
init_state : state;
end_state : state;
dfa_func: state -> terminal -> state;
}.

Where state is an inductive type as:

Inductive state:Type :=
S.

And the type terminal terminal is

Inductive terminal:Type :=
a | b.

I am trying to define it so later I'll be able to generalize the definition for any regular language. For now, I'd want to construct an automata which recognizes the language (a * b *), which is all words over the {a,b} alphabet. Does anyone have an idea on how to build some kind of fixpoint function that will run the word (which I see as a list of terminal) and tell me if that automata recgonizes that word or not? Any idea/help will be greatly apreciated.

Thanks in advance, Erick.


Solution

  • Because you're restricting yourself to regular languages, this is quite simple: you just have to use a fold. Here is a sample:

    Require Import Coq.Lists.List.
    Import ListNotations.
    
    Set Implicit Arguments.
    Unset Strict Implicit.
    Unset Printing Implicit Defensive.
    
    Record dfa (S A : Type) := DFA {
      initial_state : S;
      is_final : S -> bool;
      next : S -> A -> S
    }.
    
    Definition run_dfa S A (m : dfa S A) (l : list A) : bool :=
      is_final m (fold_left (next m) l (initial_state m)).
    

    This snippet is a little bit different from your original definition in that the state and alphabet components are now type parameters of the DFA, and in that I have replaced the end state with a predicate that answers whether we are in an accepting state or not. The run_dfa function simply iterates the transition function of the DFA starting from the initial state, and then tests whether the last state is an accepting state.

    You can use this infrastructure to describe pretty much any regular language. For instance, here is an automaton for recognizing a*b*:

    Inductive ab := A | B.
    
    Inductive ab_state : Type :=
      ReadA | ReadB | Fail.
    
    Definition ab_dfa : dfa ab_state ab := {|
      initial_state := ReadA;
      is_final s := match s with Fail => false | _ => true end;
      next s x :=
        match s, x with
        | ReadB, A => Fail
        | ReadA, B => ReadB
        | _, _ => s
        end
    |}.
    

    We can prove that this automaton does what we expect. Here is a theorem that says that it accepts strings of the sought language:

    Lemma ab_dfa_complete n m : run_dfa ab_dfa (repeat A n ++ repeat B m) = true.
    Proof.
      unfold run_dfa. rewrite fold_left_app.
      assert (fold_left (next ab_dfa) (repeat A n) (initial_state ab_dfa) = ReadA) as ->.
      { now simpl; induction n as [| n IH]; simpl; trivial. }
      destruct m as [|m]; simpl; trivial.
      induction m as [|m IH]; simpl; trivial.
    Qed.
    

    We can also state a converse, that says that it accepts only strings of that language, and nothing else. I have left the proof out; it shouldn't be hard to figure it out.

    Lemma ab_dfa_sound l :
      run_dfa ab_dfa l = true ->
      exists n m, l = repeat A n ++ repeat B m.
    

    Unfortunately, there is not much we can do with this representation besides running the automaton. In particular, we cannot minimize an automaton, test whether two automata are equivalent, etc. These functions also need to take as arguments lists that enumerate all elements of the state and alphabet types, S and A.