IndPropInductively Defined Propositions

Set Warnings "-notation-overridden,-parsing,-deprecated-hint-without-locality".
From LF Require Export Logic.
From Coq Require Import Lia.

Inductively Defined Propositions

In the Logic chapter, we looked at several ways of writing propositions, including conjunction, disjunction, and existential quantification.
In this chapter, we bring yet another new tool into the mix: inductively defined propositions.
To begin, some examples...

The Collatz Conjecture

The Collatz Conjecture is a famous open problem in number theory.
Its statement is surprisingly simple. First, we define a function f on numbers, as follows:
Fixpoint div2 (n : nat) :=
  match n with
    0 ⇒ 0
  | 1 ⇒ 0
  | S (S n) ⇒ S (div2 n)
  end.

Definition f (n : nat) :=
  if even n then div2 n
  else (3 × n) + 1.
Next, we look at what happens when we repeatedly apply f to some given starting number. For example, f 12 is 6, and f 6 is 3, so by repeatedly applying f we get the sequence 12, 6, 3, 10, 5, 16, 8, 4, 2, 1.
Similarly, if we start with 19, we get the longer sequence 19, 58, 29, 88, 44, 22, 11, 34, 17, 52, 26, 13, 40, 20, 10, 5, 16, 8, 4, 2, 1.
Both of these sequences eventually reach 1. The question posed by Collatz was: Does the sequence starting from any natural number eventually reach 1?
To formalize this question in Coq, we might try to define a recursive function that computes the total number of steps that it takes for such a sequence to reach 1.
Fail Fixpoint reaches_1_in (n : nat) :=
  if n =? 1 then 0
  else 1 + reaches_1_in (f n).
This definition is rejected by Coq's termination checker, since the argument to the recursive call, f n, is not "obviously smaller" than n.
Indeed, this isn't just a silly limitation of the termination checker. Functions in Coq are required to be total, and checking that this particular function is total would be equivalent to settling the Collatz conjecture!
Fortunately, there is another way to do it: We can express the concept "reaches 1 eventually" as an inductively defined property of numbers:
Inductive reaches_1 : nat Prop :=
  | term_done : reaches_1 1
  | term_more (n : nat) : reaches_1 (f n) reaches_1 n.
The details of such definitions are written will be explained below; for the moment, the way to read this one is: "The number 1 reaches 1, and any number n reaches 1 if f n does."
The Collatz conjecture then states that the sequence beginning from any number reaches 1:
Conjecture collatz : n, reaches_1 n.
If you succeed in proving this conjecture, you've got a bright future as a number theorist. But don't spend too long on it -- it's been open since 1937!

Transitive Closure

A binary relation on a set X is a family of propositions parameterized by two elements of X -- i.e., a proposition about pairs of elements of X.
For example, a familiar binary relation on nat is le, the less-than-or-equal-to relation.
Module LePlayground.
The following definition says that there are two ways to show that one number is less than or equal to another: either observe that they are the same number, or, if the second has the form S m, give evidence that the first is less than or equal to m.
Inductive le : nat nat Prop :=
  | le_n (n : nat) : le n n
  | le_S (n m : nat) : le n m le n (S m).

End LePlayground.
The transitive closure of a relation R is the smallest relation that contains R and that is transitive.
Inductive clos_trans {X: Type} (R: XXProp) : XXProp :=
  | t_step (x y : X) :
      R x y
      clos_trans R x y
  | t_trans (x y z : X) :
      clos_trans R x y
      clos_trans R y z
      clos_trans R x z.

Evenness (yet again)

We've already seen two ways of stating a proposition that a number n is even: We can say
(1) even n = true, or
(2) k, n = double k.
A third possibility, which we'll use as a running example for the rest of this chapter, is to say that n is even if we can establish its evenness from the following rules:
  • The number 0 is even.
  • If n is even, then S (S n) is even.
(Defining evenness in this way may seem a bit confusing, since we have already seen another perfectly good way of doing it -- "n is even if it is equal to the result of doubling some number". But it makes a convenient running example because it is simple and compact.)
To illustrate how this new definition of evenness works, let's imagine using it to show that 4 is even. First, we give the rules names for easy reference:
  • Rule ev_0: The number 0 is even.
  • Rule ev_SS: If n is even, then S (S n) is even.
Now, by rule ev_SS, it suffices to show that 2 is even. This, in turn, is again guaranteed by rule ev_SS, as long as we can show that 0 is even. But this last fact follows directly from the ev_0 rule.
We can translate the informal definition of evenness from above into a formal Inductive declaration, where each "way that a number can be even" corresponds to a separate constructor:
Inductive ev : nat Prop :=
  | ev_0 : ev 0
  | ev_SS (n : nat) (H : ev n) : ev (S (S n)).
This definition is interestingly different from previous uses of Inductive. For one thing, we are defining not a Type (like nat) or a function yielding a Type (like list), but rather a function from nat to Prop -- that is, a property of numbers. But what is really new is that, because the nat argument of ev appears to the right of the colon on the first line, it is allowed to take different values in the types of different constructors: 0 in the type of ev_0 and S (S n) in the type of ev_SS. Accordingly, the type of each constructor must be specified explicitly (after a colon), and each constructor's type must have the form ev n for some natural number n.
In contrast, recall the definition of list:
    Inductive list (X:Type) : Type :=
      | nil
      | cons (x : X) (l : list X).
or equivalently:
    Inductive list (X:Type) : Type :=
      | nil : list X
      | cons (x : X) (l : list X) : list X.
This definition introduces the X parameter globally, to the left of the colon, forcing the result of nil and cons to be the same type (i.e., list X). But if we had tried to bring nat to the left of the colon in defining ev, we would have seen an error:
Fail Inductive wrong_ev (n : nat) : Prop :=
  | wrong_ev_0 : wrong_ev 0
  | wrong_ev_SS (H: wrong_ev n) : wrong_ev (S (S n)).
(* ===> Error: Last occurrence of "wrong_ev" must have "n" as 1st
        argument in "wrong_ev 0". *)

In an Inductive definition, an argument to the type constructor on the left of the colon is called a "parameter", whereas an argument on the right is called an "index" or "annotation."
For example, in Inductive list (X : Type) := ..., the X is a parameter, while in Inductive ev : nat Prop := ..., the unnamed nat argument is an index.
We can think of this as defining a Coq property ev : nat Prop, together with "evidence constructors" ev_0 : ev 0 and ev_SS : n, ev n ev (S (S n)).
These evidence constructors can be thought of as "primitive evidence of evenness", and they can be used just like proven theorems. In particular, we can use Coq's apply tactic with the constructor names to obtain evidence for ev of particular numbers...
Theorem ev_4 : ev 4.
Proof. apply ev_SS. apply ev_SS. apply ev_0. Qed.
... or we can use function application syntax to combine several constructors:
Theorem ev_4' : ev 4.
Proof. apply (ev_SS 2 (ev_SS 0 ev_0)). Qed.
In this way, we can also prove theorems that have hypotheses involving ev.
Theorem ev_plus4 : n, ev n ev (4 + n).
Proof.
  intros n. simpl. intros Hn. apply ev_SS. apply ev_SS. apply Hn.
Qed.

Exercise: 1 star, standard (ev_double)

Theorem ev_double : n,
  ev (double n).
Proof.
  (* FILL IN HERE *) Admitted.

Using Evidence in Proofs

Besides constructing evidence that numbers are even, we can also destruct such evidence, reasoning about how it could have been built.
Introducing ev with an Inductive declaration tells Coq not only that the constructors ev_0 and ev_SS are valid ways to build evidence that some number is ev, but also that these two constructors are the only ways to build evidence that numbers are ev.
In other words, if someone gives us evidence E for the assertion ev n, then we know that E must be one of two things:
  • E is ev_0 (and n is O), or
  • E is ev_SS n' E' (and n is S (S n'), where E' is evidence for ev n').
This suggests that it should be possible to analyze a hypothesis of the form ev n much as we do inductively defined data structures; in particular, it should be possible to argue by case analysis or by induction on such evidence. Let's look at a few examples to see what this means in practice.

Inversion on Evidence

Suppose we are proving some fact involving a number n, and we are given ev n as a hypothesis. We already know how to perform case analysis on n using destruct or induction, generating separate subgoals for the case where n = O and the case where n = S n' for some n'. But for some proofs we may instead want to analyze the evidence for ev n directly.
As a tool for such proofs, we can formalize the intuitive characterization that we gave above for evidence of ev n, using destruct.
Theorem ev_inversion : (n : nat),
    ev n
    (n = 0) ( n', n = S (S n') ev n').
Proof.
  intros n E. destruct E as [ | n' E'] eqn:EE.
  - (* E = ev_0 : ev 0 *)
    left. reflexivity.
  - (* E = ev_SS n' E' : ev (S (S n')) *)
    right. n'. split. reflexivity. apply E'.
Qed.
Facts like this are often called "inversion lemmas" because they allow us to "invert" some given information to reason about all the different ways it could have been derived.
Here, there are two ways to prove ev n, and the inversion lemma makes this explicit.
We can use the inversion lemma that we proved above to help structure proofs:
Theorem evSS_ev : n, ev (S (S n)) ev n.
Proof.
  intros n H. apply ev_inversion in H. destruct H as [H0|H1].
  - discriminate H0.
  - destruct H1 as [n' [Hnm Hev]]. injection Hnm as Heq.
    rewrite Heq. apply Hev.
Qed.
Note how the inversion lemma produces two subgoals, which correspond to the two ways of proving ev. The first subgoal is a contradiction that is discharged with discriminate. The second subgoal makes use of injection and rewrite.
Coq provides a handy tactic called inversion that factors out this common pattern, saving us the trouble of explicitly stating and proving an inversion lemma for every Inductive definition we make.
Here, the inversion tactic can detect (1) that the first case, where n = 0, does not apply and (2) that the n' that appears in the ev_SS case must be the same as n. It includes an "as" annotation similar to destruct, allowing us to assign names rather than have Coq choose them.
Theorem evSS_ev' : n,
  ev (S (S n)) ev n.
Proof.
  intros n E. inversion E as [| n' E' Heq].
  (* We are in the E = ev_SS n' E' case now. *)
  apply E'.
Qed.
The inversion tactic can apply the principle of explosion to "obviously contradictory" hypotheses involving inductively defined properties, something that takes a bit more work using our inversion lemma. Compare:
Theorem one_not_even : ¬ ev 1.
Proof.
  intros H. apply ev_inversion in H. destruct H as [ | [m [Hm _]]].
  - discriminate H.
  - discriminate Hm.
Qed.

Theorem one_not_even' : ¬ ev 1.
Proof.
  intros H. inversion H. Qed.

Exercise: 1 star, standard (inversion_practice)

Prove the following result using inversion. (For extra practice, you can also prove it using the inversion lemma.)
Theorem SSSSev__even : n,
  ev (S (S (S (S n)))) ev n.
Proof.
  (* FILL IN HERE *) Admitted.

Exercise: 1 star, standard (ev5_nonsense)

Prove the following result using inversion.
Theorem ev5_nonsense :
  ev 5 2 + 2 = 9.
Proof.
  (* FILL IN HERE *) Admitted.
The inversion tactic does quite a bit of work. For example, when applied to an equality assumption, it does the work of both discriminate and injection. In addition, it carries out the intros and rewrites that are typically necessary in the case of injection. It can also be applied to analyze evidence for arbitrary inductively defined propositions, not just equality. As examples, we'll use it to re-prove some theorems from chapter Tactics. (Here we are being a bit lazy by omitting the as clause from inversion, thereby asking Coq to choose names for the variables and hypotheses that it introduces.)
Theorem inversion_ex1 : (n m o : nat),
  [n; m] = [o; o] [n] = [m].
Proof.
  intros n m o H. inversion H. reflexivity. Qed.

Theorem inversion_ex2 : (n : nat),
  S n = O 2 + 2 = 5.
Proof.
  intros n contra. inversion contra. Qed.
Here's how inversion works in general.
  • Suppose the name H refers to an assumption P in the current context, where P has been defined by an Inductive declaration.
  • Then, for each of the constructors of P, inversion H generates a subgoal in which H has been replaced by the specific conditions under which this constructor could have been used to prove P.
  • Some of these subgoals will be self-contradictory; inversion throws these away.
  • The ones that are left represent the cases that must be proved to establish the original goal. For those, inversion adds to the proof context all equations that must hold of the arguments given to P -- e.g., S (S n') = n in the proof of evSS_ev).
The ev_double exercise above shows that our new notion of evenness is implied by the two earlier ones (since, by even_bool_prop in chapter Logic, we already know that those are equivalent to each other). To show that all three coincide, we just need the following lemma.
Lemma ev_Even_firsttry : n,
  ev n Even n.
Proof.
  (* WORKED IN CLASS *) unfold Even.
We could try to proceed by case analysis or induction on n. But since ev is mentioned in a premise, this strategy seems unpromising, because (as we've noted before) the induction hypothesis will talk about n-1 (which is not even!). Thus, it seems better to first try inversion on the evidence for ev. Indeed, the first case can be solved trivially. And we can seemingly make progress on the second case with a helper lemma.
  intros n E. inversion E as [EQ' | n' E' EQ'].
  - (* E = ev_0 *) 0. reflexivity.
  - (* E = ev_SS n' E' *)
Unfortunately, the second case is harder. We need to show n0, S (S n') = double n0, but the only available assumption is E', which states that ev n' holds. Since this isn't directly useful, it seems that we are stuck and that performing case analysis on E was a waste of time.
If we look more closely at our second goal, however, we can see that something interesting happened: By performing case analysis on E, we were able to reduce the original result to a similar one that involves a different piece of evidence for ev: namely E'. More formally, we could finish our proof if we could show that
         k', n' = double k', which is the same as the original statement, but with n' instead of n. Indeed, it is not difficult to convince Coq that this intermediate result would suffice.
    assert (H: ( k', n' = double k')
                ( n0, S (S n') = double n0)).
        { intros [k' EQ'']. (S k'). simpl.
          rewrite <- EQ''. reflexivity. }
    apply H.
Unfortunately, now we are stuck. To see this clearly, let's move E' back into the goal from the hypotheses.
    generalize dependent E'.
Now it is obvious that we are trying to prove another instance of the same theorem we set out to prove -- only here we are talking about n' instead of n.
Abort.

Induction on Evidence

If this story feels familiar, it is no coincidence: We've encountered similar problems in the Induction chapter, when trying to use case analysis to prove results that required induction. And once again the solution is... induction!
The behavior of induction on evidence is the same as its behavior on data: It causes Coq to generate one subgoal for each constructor that could have used to build that evidence, while providing an induction hypothesis for each recursive occurrence of the property in question.
To prove that a property of n holds for all even numbers (i.e., those for which ev n holds), we can use induction on ev n. This requires us to prove two things, corresponding to the two ways in which ev n could have been constructed. If it was constructed by ev_0, then n=0 and the property must hold of 0. If it was constructed by ev_SS, then the evidence of ev n is of the form ev_SS n' E', where n = S (S n') and E' is evidence for ev n'. In this case, the inductive hypothesis says that the property we are trying to prove holds for n'.
Let's try proving that lemma again:
Lemma ev_Even : n,
  ev n Even n.
Proof.
  intros n E.
  induction E as [|n' E' IH].
  - (* E = ev_0 *)
    unfold Even. 0. reflexivity.
  - (* E = ev_SS n' E'
       with IH : Even E' *)

    unfold Even in IH.
    destruct IH as [k Hk].
    rewrite Hk.
    unfold Even. (S k). simpl. reflexivity.
Qed.
Here, we can see that Coq produced an IH that corresponds to E', the single recursive occurrence of ev in its own definition. Since E' mentions n', the induction hypothesis talks about n', as opposed to n or some other number.
The equivalence between the second and third definitions of evenness now follows.
Theorem ev_Even_iff : n,
  ev n Even n.
Proof.
  intros n. split.
  - (* -> *) apply ev_Even.
  - (* <- *) unfold Even. intros [k Hk]. rewrite Hk. apply ev_double.
Qed.
As we will see in later chapters, induction on evidence is a recurring technique across many areas -- in particular for formalizing the semantics of programming languages.
The following exercises provide simple examples of this technique, to help you familiarize yourself with it.

Exercise: 2 stars, standard (ev_sum)

Theorem ev_sum : n m, ev n ev m ev (n + m).
Proof.
  (* FILL IN HERE *) Admitted.

Exercise: 3 stars, standard, especially useful (ev_ev__ev)

Theorem ev_ev__ev : n m,
  ev (n+m) ev n ev m.
  (* Hint: There are two pieces of evidence you could attempt to induct upon
      here. If one doesn't work, try the other. *)

Proof.
  (* FILL IN HERE *) Admitted.
From the definition of le, we can sketch the behaviors of destruct, inversion, and induction on a hypothesis H providing evidence of the form le e1 e2. Doing destruct H will generate two cases. In the first case, e1 = e2, and it will replace instances of e2 with e1 in the goal and context. In the second case, e2 = S n' for some n' for which le e1 n' holds, and it will replace instances of e2 with S n'. Doing inversion H will remove impossible cases and add generated equalities to the context for further use. Doing induction H will, in the second case, add the induction hypothesis that the goal holds when e2 is replaced with n'.

Exercise: 5 stars, standard (le_exercises)

Here are a number of facts about the and < relations that we are going to need later in the course. The proofs make good practice exercises.
Lemma le_trans : m n o, m n n o m o.
Proof.
  (* FILL IN HERE *) Admitted.

Theorem O_le_n : n,
  0 n.
Proof.
  (* FILL IN HERE *) Admitted.

Theorem n_le_m__Sn_le_Sm : n m,
  n m S n S m.
Proof.
  (* FILL IN HERE *) Admitted.

Theorem Sn_le_Sm__n_le_m : n m,
  S n S m n m.
Proof.
  (* FILL IN HERE *) Admitted.

Theorem lt_ge_cases : n m,
  n < m n m.
Proof.
  (* FILL IN HERE *) Admitted.

Theorem le_plus_l : a b,
  a a + b.
Proof.
  (* FILL IN HERE *) Admitted.

Theorem plus_le : n1 n2 m,
  n1 + n2 m
  n1 m n2 m.
Proof.
 (* FILL IN HERE *) Admitted.

Theorem add_le_cases : n m p q,
  n + m p + q n p m q.
Hint: May be easiest to prove by induction on n.
Proof.
(* FILL IN HERE *) Admitted.

Theorem plus_le_compat_l : n m p,
  n m
  p + n p + m.
Proof.
  (* FILL IN HERE *) Admitted.

Theorem plus_le_compat_r : n m p,
  n m
  n + p m + p.
Proof.
  (* FILL IN HERE *) Admitted.

Theorem le_plus_trans : n m p,
  n m
  n m + p.
Proof.
  (* FILL IN HERE *) Admitted.

Theorem n_lt_m__n_le_m : n m,
  n < m
  n m.
Proof.
  (* FILL IN HERE *) Admitted.

Theorem plus_lt : n1 n2 m,
  n1 + n2 < m
  n1 < m n2 < m.
Proof.
(* FILL IN HERE *) Admitted.

Exercise: 4 stars, standard (more_le_exercises)

Theorem leb_complete : n m,
  n <=? m = true n m.
Proof.
  (* FILL IN HERE *) Admitted.

Theorem leb_correct : n m,
  n m
  n <=? m = true.
Hint: May be easiest to prove by induction on m.
Proof.
  (* FILL IN HERE *) Admitted.
Hint: The next two can easily be proved without using induction.
Theorem leb_iff : n m,
  n <=? m = true n m.
Proof.
  (* FILL IN HERE *) Admitted.

Theorem leb_true_trans : n m o,
  n <=? m = true m <=? o = true n <=? o = true.
Proof.
  (* FILL IN HERE *) Admitted.
A list is a subsequence of another list if all of the elements in the first list occur in the same order in the second list, possibly with some extra elements in between. For example,
      [1;2;3] is a subsequence of each of the lists
      [1;2;3]
      [1;1;1;2;2;3]
      [1;2;7;3]
      [5;6;1;9;9;2;7;3;8]
but it is not a subsequence of any of the lists
      [1;2]
      [1;3]
      [5;6;2;1;7;3;8].
  • Define an inductive proposition subseq on list nat that captures what it means to be a subsequence. (Hint: You'll need three cases.)
  • Prove subseq_refl that subsequence is reflexive, that is, any list is a subsequence of itself.
  • Prove subseq_app that for any lists l1, l2, and l3, if l1 is a subsequence of l2, then l1 is also a subsequence of l2 ++ l3.
  • (Harder) Prove subseq_trans that subsequence is transitive -- that is, if l1 is a subsequence of l2 and l2 is a subsequence of l3, then l1 is a subsequence of l3.
Inductive subseq : list nat list nat Prop :=
  | sub_nil l : subseq [] l
  | sub_take x l1 l2 (H : subseq l1 l2) : subseq (x :: l1) (x :: l2)
  | sub_skip x l1 l2 (H : subseq l1 l2) : subseq l1 (x :: l2)
.

Theorem subseq_refl : (l : list nat), subseq l l.
Proof.
  (* FILL IN HERE *) Admitted.

Theorem subseq_app : (l1 l2 l3 : list nat),
  subseq l1 l2
  subseq l1 (l2 ++ l3).
Proof.
  (* FILL IN HERE *) Admitted.

Theorem subseq_trans : (l1 l2 l3 : list nat),
  subseq l1 l2
  subseq l2 l3
  subseq l1 l3.
Proof.
  (* Hint: be careful about what you are doing induction on and which
     other things need to be generalized... *)

  (* FILL IN HERE *) Admitted.

A Digression on Notation

There are several equivalent ways of writing inductive types.  We've mostly seen this style...
Module bin1.
Inductive bin : Type :=
  | Z
  | B0 (n : bin)
  | B1 (n : bin).
End bin1.
... which omits the result types because they are all bin.
It is completely equivalent to this...
Module bin2.
Inductive bin : Type :=
  | Z : bin
  | B0 (n : bin) : bin
  | B1 (n : bin) : bin.
End bin2.
... where we fill them in, and this...
Module bin3.
Inductive bin : Type :=
  | Z : bin
  | B0 : bin bin
  | B1 : bin bin.
End bin3.
... where we put everything on the right of the colon.
For inductively defined propositions, we need to explicitly give the result type for each constructor (because they are not all the same), so the first style doesn't make sense, but we can use either the second or the third interchangeably.

Case Study: Regular Expressions

The ev property provides a simple example for illustrating inductive definitions and the basic techniques for reasoning about them, but it is not terribly exciting -- after all, it is equivalent to the two non-inductive definitions of evenness that we had already seen, and does not seem to offer any concrete benefit over them.
To give a better sense of the power of inductive definitions, we now show how to use them to model a classic concept in computer science: regular expressions.
Regular expressions are a simple language for describing sets of strings. Their syntax is defined as follows:
Inductive reg_exp (T : Type) : Type :=
  | EmptySet
  | EmptyStr
  | Char (t : T)
  | App (r1 r2 : reg_exp T)
  | Union (r1 r2 : reg_exp T)
  | Star (r : reg_exp T).

Arguments EmptySet {T}.
Arguments EmptyStr {T}.
Arguments Char {T} _.
Arguments App {T} _ _.
Arguments Union {T} _ _.
Arguments Star {T} _.
Note that this definition is polymorphic: Regular expressions in reg_exp T describe strings with characters drawn from T -- that is, lists of elements of T.
(Technical aside: We depart slightly from standard practice in that we do not require the type T to be finite. This results in a somewhat different theory of regular expressions, but the difference is not significant for present purposes.)
We connect regular expressions and strings via the following rules, which define when a regular expression matches some string:
  • The expression EmptySet does not match any string.
  • The expression EmptyStr matches the empty string [].
  • The expression Char x matches the one-character string [x].
  • If re1 matches s1, and re2 matches s2, then App re1 re2 matches s1 ++ s2.
  • If at least one of re1 and re2 matches s, then Union re1 re2 matches s.
  • Finally, if we can write some string s as the concatenation of a sequence of strings s = s_1 ++ ... ++ s_k, and the expression re matches each one of the strings s_i, then Star re matches s.
    In particular, the sequence of strings may be empty, so Star re always matches the empty string [] no matter what re is.
We can easily translate this informal definition into an Inductive one as follows. We use the notation s =~ re in place of exp_match s re. (By "reserving" the notation before defining the Inductive, we can use it in the definition.)
Reserved Notation "s =~ re" (at level 80).

Inductive exp_match {T} : list T reg_exp T Prop :=
  | MEmpty : [] =~ EmptyStr
  | MChar x : [x] =~ (Char x)
  | MApp s1 re1 s2 re2
             (H1 : s1 =~ re1)
             (H2 : s2 =~ re2)
           : (s1 ++ s2) =~ (App re1 re2)
  | MUnionL s1 re1 re2
                (H1 : s1 =~ re1)
              : s1 =~ (Union re1 re2)
  | MUnionR re1 s2 re2
                (H2 : s2 =~ re2)
              : s2 =~ (Union re1 re2)
  | MStar0 re : [] =~ (Star re)
  | MStarApp s1 s2 re
                 (H1 : s1 =~ re)
                 (H2 : s2 =~ (Star re))
               : (s1 ++ s2) =~ (Star re)

  where "s =~ re" := (exp_match s re).
Notice that these rules are not quite the same as the informal ones that we gave at the beginning of the section. First, we don't need to include a rule explicitly stating that no string matches EmptySet; we just don't happen to include any rule that would have the effect of some string matching EmptySet. (Indeed, the syntax of inductive definitions doesn't even allow us to give such a "negative rule.")
Second, the informal rules for Union and Star correspond to two constructors each: MUnionL / MUnionR, and MStar0 / MStarApp. The result is logically equivalent to the original rules but more convenient to use in Coq, since the recursive occurrences of exp_match are given as direct arguments to the constructors, making it easier to perform induction on evidence. (The exp_match_ex1 and exp_match_ex2 exercises below ask you to prove that the constructors given in the inductive declaration and the ones that would arise from a more literal transcription of the informal rules are indeed equivalent.)
Let's illustrate these rules with a few examples.
Example reg_exp_ex1 : [1] =~ Char 1.
Proof.
  apply MChar.
Qed.

Example reg_exp_ex2 : [1; 2] =~ App (Char 1) (Char 2).
Proof.
  apply (MApp [1]).
  - apply MChar.
  - apply MChar.
Qed.
(Notice how the last example applies MApp to the string [1] directly. Since the goal mentions [1; 2] instead of [1] ++ [2], Coq wouldn't be able to figure out how to split the string on its own.)
Using inversion, we can also show that certain strings do not match a regular expression:
Example reg_exp_ex3 : ¬ ([1; 2] =~ Char 1).
Proof.
  intros H. inversion H.
Qed.
We can define helper functions for writing down regular expressions. The reg_exp_of_list function constructs a regular expression that matches exactly the list that it receives as an argument:
Fixpoint reg_exp_of_list {T} (l : list T) :=
  match l with
  | []EmptyStr
  | x :: l'App (Char x) (reg_exp_of_list l')
  end.

Example reg_exp_ex4 : [1; 2; 3] =~ reg_exp_of_list [1; 2; 3].
Proof.
  simpl. apply (MApp [1]).
  { apply MChar. }
  apply (MApp [2]).
  { apply MChar. }
  apply (MApp [3]).
  { apply MChar. }
  apply MEmpty.
Qed.
We can also prove general facts about exp_match. For instance, the following lemma shows that every string s that matches re also matches Star re.
Lemma MStar1 :
   T s (re : reg_exp T) ,
    s =~ re
    s =~ Star re.
Proof.
  intros T s re H.
  rewrite <- (app_nil_r _ s).
  apply MStarApp.
  - apply H.
  - apply MStar0.
Qed.
(Note the use of app_nil_r to change the goal of the theorem to exactly the same shape expected by MStarApp.)

Exercise: 3 stars, standard (exp_match_ex1)

The following lemmas show that the informal matching rules given at the beginning of the chapter can be obtained from the formal inductive definition.
Lemma empty_is_empty : T (s : list T),
  ¬ (s =~ EmptySet).
Proof.
  (* FILL IN HERE *) Admitted.

Lemma MUnion' : T (s : list T) (re1 re2 : reg_exp T),
  s =~ re1 s =~ re2
  s =~ Union re1 re2.
Proof.
  (* FILL IN HERE *) Admitted.
The next lemma is stated in terms of the fold function from the Poly chapter: If ss : list (list T) represents a sequence of strings s1, ..., sn, then fold app ss [] is the result of concatenating them all together.
Lemma MStar' : T (ss : list (list T)) (re : reg_exp T),
  ( s, In s ss s =~ re)
  fold app ss [] =~ Star re.
Proof.
  (* FILL IN HERE *) Admitted.
Since the definition of exp_match has a recursive structure, we might expect that proofs involving regular expressions will often require induction on evidence.
For example, suppose we want to prove the following intuitive result: If a regular expression re matches some string s, then all elements of s must occur as character literals somewhere in re.
To state this as a theorem, we first define a function re_chars that lists all characters that occur in a regular expression:
Fixpoint re_chars {T} (re : reg_exp T) : list T :=
  match re with
  | EmptySet[]
  | EmptyStr[]
  | Char x[x]
  | App re1 re2re_chars re1 ++ re_chars re2
  | Union re1 re2re_chars re1 ++ re_chars re2
  | Star rere_chars re
  end.
The main theorem:
Theorem in_re_match : T (s : list T) (re : reg_exp T) (x : T),
  s =~ re
  In x s
  In x (re_chars re).
Proof.
  intros T s re x Hmatch Hin.
  induction Hmatch
    as [| x'
        | s1 re1 s2 re2 Hmatch1 IH1 Hmatch2 IH2
        | s1 re1 re2 Hmatch IH | re1 s2 re2 Hmatch IH
        | re | s1 s2 re Hmatch1 IH1 Hmatch2 IH2].
  (* WORKED IN CLASS *)
  - (* MEmpty *)
    simpl in Hin. destruct Hin.
  - (* MChar *)
    simpl. simpl in Hin.
    apply Hin.
  - (* MApp *)
    simpl.
Something interesting happens in the MApp case. We obtain two induction hypotheses: One that applies when x occurs in s1 (which matches re1), and a second one that applies when x occurs in s2 (which matches re2).
    rewrite In_app_iff in ×.
    destruct Hin as [Hin | Hin].
    + (* In x s1 *)
      left. apply (IH1 Hin).
    + (* In x s2 *)
      right. apply (IH2 Hin).
  - (* MUnionL *)
    simpl. rewrite In_app_iff.
    left. apply (IH Hin).
  - (* MUnionR *)
    simpl. rewrite In_app_iff.
    right. apply (IH Hin).
  - (* MStar0 *)
    destruct Hin.
  - (* MStarApp *)
    simpl.
Here again we get two induction hypotheses, and they illustrate why we need induction on evidence for exp_match, rather than induction on the regular expression re: The latter would only provide an induction hypothesis for strings that match re, which would not allow us to reason about the case In x s2.
    rewrite In_app_iff in Hin.
    destruct Hin as [Hin | Hin].
    + (* In x s1 *)
      apply (IH1 Hin).
    + (* In x s2 *)
      apply (IH2 Hin).
Qed.

Exercise: 3 stars, standard (re_not_empty_correct)

Prove that [re_not_empty] is correct.
Fixpoint re_not_empty {T : Type} (re : reg_exp T) : bool :=
  match re with
  | EmptySetfalse
  | EmptyStrtrue
  | Char _true
  | App re1 re2(re_not_empty re1) && (re_not_empty re2)
  | Union re1 re2(re_not_empty re1) || (re_not_empty re2)
  | Star retrue
  end.

Lemma re_not_empty_correct : T (re : reg_exp T),
  ( s, s =~ re) re_not_empty re = true.
Proof.
  (* FILL IN HERE *) Admitted.

The remember Tactic

One potentially confusing feature of the induction tactic is that it will let you try to perform an induction over a term that isn't sufficiently general. The effect of this is to lose information (much as destruct without an eqn: clause can do), and leave you unable to complete the proof. Here's an example:
Lemma star_app: T (s1 s2 : list T) (re : reg_exp T),
  s1 =~ Star re
  s2 =~ Star re
  s1 ++ s2 =~ Star re.
Proof.
  intros T s1 s2 re H1.
Now, just doing an inversion on H1 won't get us very far in the recursive cases. (Try it!). So we need induction (on evidence!). Here is a naive first attempt. (We can begin by generalizing s2, since it's pretty clear that we are going to have to walk over both s1 and s2 in parallel.)
  generalize dependent s2.
  induction H1
    as [|x'|s1 re1 s2' re2 Hmatch1 IH1 Hmatch2 IH2
        |s1 re1 re2 Hmatch IH|re1 s2' re2 Hmatch IH
        |re''|s1 s2' re'' Hmatch1 IH1 Hmatch2 IH2].
But now, although we get seven cases (as we would expect from the definition of exp_match), we have lost a very important bit of information from H1: the fact that s1 matched something of the form Star re. This means that we have to give proofs for all seven constructors of this definition, even though all but two of them (MStar0 and MStarApp) are contradictory. We can still get the proof to go through for a few constructors, such as MEmpty...
  - (* MEmpty *)
    simpl. intros s2 H. apply H.
... but most cases get stuck. For MChar, for instance, we must show
      s2 =~ Char x'
      x'::s2 =~ Char x'
which is clearly impossible.
  - (* MChar. *) intros s2 H. simpl. (* Stuck... *)
Abort.
The problem here is that induction over a Prop hypothesis only works properly with hypotheses that are "completely general," i.e., ones in which all the arguments are variables, as opposed to more complex expressions like Star re.
(In this respect, induction on evidence behaves more like destruct-without-eqn: than like inversion.)
A possible, but awkward, way to solve this problem is "manually generalizing" over the problematic expressions by adding explicit equality hypotheses to the lemma:
Lemma star_app: T (s1 s2 : list T) (re re' : reg_exp T),
  re' = Star re
  s1 =~ re'
  s2 =~ Star re
  s1 ++ s2 =~ Star re.
We can now proceed by performing induction over evidence directly, because the argument to the first hypothesis is sufficiently general, which means that we can discharge most cases by inverting the re' = Star re equality in the context. This works, but it makes the statement of the lemma a bit ugly. Fortunately, there is a better way...
Abort.
The tactic remember e as x causes Coq to (1) replace all occurrences of the expression e by the variable x, and (2) add an equation x = e to the context. Here's how we can use it to show the above result:
Lemma star_app: T (s1 s2 : list T) (re : reg_exp T),
  s1 =~ Star re
  s2 =~ Star re
  s1 ++ s2 =~ Star re.
Proof.
  intros T s1 s2 re H1.
  remember (Star re) as re'.
We now have Heqre' : re' = Star re.
  generalize dependent s2.
  induction H1
    as [|x'|s1 re1 s2' re2 Hmatch1 IH1 Hmatch2 IH2
        |s1 re1 re2 Hmatch IH|re1 s2' re2 Hmatch IH
        |re''|s1 s2' re'' Hmatch1 IH1 Hmatch2 IH2].
The Heqre' is contradictory in most cases, allowing us to conclude immediately.
  - (* MEmpty *) discriminate.
  - (* MChar *) discriminate.
  - (* MApp *) discriminate.
  - (* MUnionL *) discriminate.
  - (* MUnionR *) discriminate.
The interesting cases are those that correspond to Star. Note that the induction hypothesis IH2 on the MStarApp case mentions an additional premise Star re'' = Star re, which results from the equality generated by remember.
  - (* MStar0 *)
    injection Heqre' as Heqre''. intros s H. apply H.

  - (* MStarApp *)
    injection Heqre' as Heqre''.
    intros s2 H1. rewrite <- app_assoc.
    apply MStarApp.
    + apply Hmatch1.
    + apply IH2.
      × rewrite Heqre''. reflexivity.
      × apply H1.
Qed.

Exercise: 5 stars, advanced (weak_pumping)

One of the first really interesting theorems in the theory of regular expressions is the so-called pumping lemma, which states, informally, that any sufficiently long string s matching a regular expression re can be "pumped" by repeating some middle section of s an arbitrary number of times to produce a new string also matching re. (For the sake of simplicity in this exercise, we consider a slightly weaker theorem than is usually stated in courses on automata theory -- hence the name weak_pumping.)
To get started, we need to define "sufficiently long." Since we are working in a constructive logic, we actually need to be able to calculate, for each regular expression re, the minimum length for strings s to guarantee "pumpability."
Module Pumping.

Fixpoint pumping_constant {T} (re : reg_exp T) : nat :=
  match re with
  | EmptySet ⇒ 1
  | EmptyStr ⇒ 1
  | Char _ ⇒ 2
  | App re1 re2
      pumping_constant re1 + pumping_constant re2
  | Union re1 re2
      pumping_constant re1 + pumping_constant re2
  | Star rpumping_constant r
  end.
You may find these lemmas about the pumping constant useful when proving the pumping lemma below.
Lemma pumping_constant_ge_1 :
   T (re : reg_exp T),
    pumping_constant re 1.
Proof.
  intros T re. induction re.
  - (* EmptySet *)
    apply le_n.
  - (* EmptyStr *)
    apply le_n.
  - (* Char *)
    apply le_S. apply le_n.
  - (* App *)
    simpl.
    apply le_trans with (n:=pumping_constant re1).
    apply IHre1. apply le_plus_l.
  - (* Union *)
    simpl.
    apply le_trans with (n:=pumping_constant re1).
    apply IHre1. apply le_plus_l.
  - (* Star *)
    simpl. apply IHre.
Qed.

Lemma pumping_constant_0_false :
   T (re : reg_exp T),
    pumping_constant re = 0 False.
Proof.
  intros T re H.
  assert (Hp1 : pumping_constant re 1).
  { apply pumping_constant_ge_1. }
  inversion Hp1 as [Hp1'| p Hp1' Hp1''].
  - rewrite H in Hp1'. discriminate Hp1'.
  - rewrite H in Hp1''. discriminate Hp1''.
Qed.
Next, it is useful to define an auxiliary function that repeats a string (appends it to itself) some number of times.
Fixpoint napp {T} (n : nat) (l : list T) : list T :=
  match n with
  | 0 ⇒ []
  | S n'l ++ napp n' l
  end.
This auxiliary lemma might also be useful in your proof of the pumping lemma.
Lemma napp_plus: T (n m : nat) (l : list T),
  napp (n + m) l = napp n l ++ napp m l.
Proof.
  intros T n m l.
  induction n as [|n IHn].
  - reflexivity.
  - simpl. rewrite IHn, app_assoc. reflexivity.
Qed.

Lemma napp_star :
   T m s1 s2 (re : reg_exp T),
    s1 =~ re s2 =~ Star re
    napp m s1 ++ s2 =~ Star re.
Proof.
  intros T m s1 s2 re Hs1 Hs2.
  induction m.
  - simpl. apply Hs2.
  - simpl. rewrite <- app_assoc.
    apply MStarApp.
    + apply Hs1.
    + apply IHm.
Qed.
The (weak) pumping lemma itself says that, if s =~ re and if the length of s is at least the pumping constant of re, then s can be split into three substrings s1 ++ s2 ++ s3 in such a way that s2 can be repeated any number of times and the result, when combined with s1 and s3, will still match re. Since s2 is also guaranteed not to be the empty string, this gives us a (constructive!) way to generate strings matching re that are as long as we like.
Lemma weak_pumping : T (re : reg_exp T) s,
  s =~ re
  pumping_constant re length s
   s1 s2 s3,
    s = s1 ++ s2 ++ s3
    s2 []
     m, s1 ++ napp m s2 ++ s3 =~ re.
Complete the proof below. Several of the lemmas about le that were in an optional exercise earlier in this chapter may also be useful.
Proof.
  intros T re s Hmatch.
  induction Hmatch
    as [ | x | s1 re1 s2 re2 Hmatch1 IH1 Hmatch2 IH2
       | s1 re1 re2 Hmatch IH | re1 s2 re2 Hmatch IH
       | re | s1 s2 re Hmatch1 IH1 Hmatch2 IH2 ].
  - (* MEmpty *)
    simpl. intros contra. inversion contra.
  (* FILL IN HERE *) Admitted.
End Pumping.

Case Study: Improving Reflection

We've seen in the Logic chapter that we often need to relate boolean computations to statements in Prop. But performing this conversion as we did there can result in tedious proof scripts. Consider the proof of the following theorem:
Theorem filter_not_empty_In : n l,
  filter (fun xn =? x) l []
  In n l.
Proof.
  intros n l. induction l as [|m l' IHl'].
  - (* l =  *)
    simpl. intros H. apply H. reflexivity.
  - (* l = m :: l' *)
    simpl. destruct (n =? m) eqn:H.
    + (* n =? m = true *)
      intros _. rewrite eqb_eq in H. rewrite H.
      left. reflexivity.
    + (* n =? m = false *)
      intros H'. right. apply IHl'. apply H'.
Qed.
In the first branch after destruct, we explicitly apply the eqb_eq lemma to the equation generated by destructing n =? m, to convert the assumption n =? m = true into the assumption n = m; then we had to rewrite using this assumption to complete the case.
We can streamline this sort of reasoning by defining an inductive proposition that yields a better case-analysis principle for n =? m. Instead of generating the assumption (n =? m) = true, which usually requires some massaging before we can use it, this principle gives us right away the assumption we really need: n = m.
Following the terminology introduced in Logic, we call this the "reflection principle for equality on numbers," and we say that the boolean n =? m is reflected in the proposition n = m.
Inductive reflect (P : Prop) : bool Prop :=
  | ReflectT (H : P) : reflect P true
  | ReflectF (H : ¬ P) : reflect P false.
The reflect property takes two arguments: a proposition P and a boolean b. It states that the property P reflects (intuitively, is equivalent to) the boolean b: that is, P holds if and only if b = true.
To see this, notice that, by definition, the only way we can produce evidence for reflect P true is by showing P and then using the ReflectT constructor. If we invert this statement, this means that we can extract evidence for P from a proof of reflect P true.
Similarly, the only way to show reflect P false is by tagging evidence for ¬ P with the ReflectF constructor.
To put this observation to work, we first prove that the statements P b = true and reflect P b are indeed equivalent. First, the left-to-right implication:
Theorem iff_reflect : P b, (P b = true) reflect P b.
Proof.
  (* WORKED IN CLASS *)
  intros P b H. destruct b eqn:Eb.
  - apply ReflectT. rewrite H. reflexivity.
  - apply ReflectF. rewrite H. intros H'. discriminate.
Qed.
Now you prove the right-to-left implication:

Exercise: 2 stars, standard, especially useful (reflect_iff)

Theorem reflect_iff : P b, reflect P b (P b = true).
Proof.
  (* FILL IN HERE *) Admitted.
We can think of reflect as a kind of variant of the usual "if and only if" connective; the advantage of reflect is that, by destructing a hypothesis or lemma of the form reflect P b, we can perform case analysis on b while at the same time generating appropriate hypothesis in the two branches (P in the first subgoal and ¬ P in the second).
Let's use reflect to produce a smoother proof of filter_not_empty_In.
We begin by recasting the eqb_eq lemma in terms of reflect:
Lemma eqbP : n m, reflect (n = m) (n =? m).
Proof.
  intros n m. apply iff_reflect. rewrite eqb_eq. reflexivity.
Qed.
The proof of filter_not_empty_In now goes as follows. Notice how the calls to destruct and rewrite in the earlier proof of this theorem are combined here into a single call to destruct.
(To see this clearly, execute the two proofs of filter_not_empty_In with Coq and observe the differences in proof state at the beginning of the first case of the destruct.)
Theorem filter_not_empty_In' : n l,
  filter (fun xn =? x) l []
  In n l.
Proof.
  intros n l. induction l as [|m l' IHl'].
  - (* l =  *)
    simpl. intros H. apply H. reflexivity.
  - (* l = m :: l' *)
    simpl. destruct (eqbP n m) as [H | H].
    + (* n = m *)
      intros _. rewrite H. left. reflexivity.
    + (* n <> m *)
      intros H'. right. apply IHl'. apply H'.
Qed.

Exercise: 3 stars, standard, especially useful (eqbP_practice)

Use eqbP as above to prove the following:
Fixpoint count n l :=
  match l with
  | [] ⇒ 0
  | m :: l'(if n =? m then 1 else 0) + count n l'
  end.

Theorem eqbP_practice : n l,
  count n l = 0 ~(In n l).
Proof.
  intros n l Hcount. induction l as [| m l' IHl'].
  (* FILL IN HERE *) Admitted.
This small example shows reflection giving us a small gain in convenience; in larger developments, using reflect consistently can often lead to noticeably shorter and clearer proof scripts. We'll see many more examples in later chapters and in Programming Language Foundations.
This use of reflect was popularized by SSReflect, a Coq library that has been used to formalize important results in mathematics, including the 4-color theorem and the Feit-Thompson theorem. The name SSReflect stands for small-scale reflection, i.e., the pervasive use of reflection to simplify small proof steps by turning them into boolean computations.
(* 2022-08-22 10:32 *)