Before Monday, Theorem 1.47 + 1.48, Theorem 1.39 “Proof Idea”, Example 1.41, Example 1.56.
Before Wednesday, read Introduction to Section 1.4 (page 77) which introduces nonregularity.
Before Friday, read Example 1.75, Example 1.77.
For Week 4 Monday: read Definition 2.13 (page 111-112) introducing Pushdown Automata.
Clearly and unambiguously communicate computational ideas using appropriate formalism. Translate across levels of abstraction.
Give examples of sets that are regular (and prove that they are).
State the definition of the class of regular languages
Give examples of regular languages, using each of the three equivalent models of computation for proving regularity.
Choose between multiple models to prove that a language is regular
Explain the limits of the class of regular languages
Describe and use models of computation that don’t involve state machines.
Given a DFA or NFA, find a regular expression that describes its language.
Given a regular expression, find a DFA or NFA that recognizes its language.
Know, select and apply appropriate computing knowledge and problem-solving techniques.
Apply classical techniques including pumping lemma, determinization, diagonalization, and reduction to analyze the complexity of languages and problems.
Justify why the Pumping Lemma is true.
Use the pumping lemma to prove that a given language is not regular.
Understand, guide, shape impact of computing on society/the world. Connect the role of Theory CS classes to other applications (in undergraduate CS curriculum and beyond). Model problems using appropriate mathematical concepts.
Explain nondeterminism and describe tools for simulating it with deterministic computation.
Given a NFA, find a DFA that recognizes its language.
Convert between regular expressions and automata
Schedule your Test 1 Attempt 1, Test 2 Attempt 1, Test 1 Attempt 2, and Test 2 Attempt 2 times at PrairieTest (http://us.prairietest.com)
Homework 2 submitted via Gradescope (https://www.gradescope.com/), due Tuesday 10/15/2024
Review Quiz 3 on PrairieLearn (http://us.prairielearn.com), complete by Sunday 10/20/2024
In Computer Science, we operationalize “hardest” as “requires most resources”, where resources might be memory, time, parallelism, randomness, power, etc. To be able to compare “hardness” of problems, we use a consistent description of problems
Input: String
Output: Yes/ No, where Yes means that the input string matches the pattern or property described by the problem.
So far: we saw that regular expressions are convenient ways of describring patterns in strings. Finite automata give a model of computation for processing strings and and classifying them into Yes (accepted) or No (rejected). We will see that each set of strings is described by a regular expression if and only if there is a FA that recognizes it. Another way of thinking about it: properties described by regular expressions require exactly the computational power of these finite automata.
So far we have that:
If there is a DFA recognizing a language, there is a DFA recognizing its complement.
If there are NFA recognizing two languages, there is a NFA recognizing their union.
If there are DFA recognizing two languages, there is a DFA recognizing their union.
If there are DFA recognizing two languages, there is a DFA recognizing their intersection.
Our goals for today are (1) prove similar results about other set operations, (2) prove that NFA and DFA are equally expressive, and therefore (3) define an important class of languages.
Suppose \(A_1, A_2\) are languages over an alphabet \(\Sigma\). Claim: if there is a NFA \(N_1\) such that \(L(N_1) = A_1\) and NFA \(N_2\) such that \(L(N_2) = A_2\), then there is another NFA, let’s call it \(N\), such that \(L(N) = A_1 \circ A_2\).
Proof idea: Allow computation to move between \(N_1\) and \(N_2\) “spontaneously" when reach an accepting state of \(N_1\), guessing that we’ve reached the point where the two parts of the string in the set-wise concatenation are glued together.
Formal construction: Let \(N_1 = (Q_1, \Sigma, \delta_1, q_1, F_1)\) and \(N_2 = (Q_2, \Sigma, \delta_2,q_2, F_2)\) and assume \(Q_1 \cap Q_2 = \emptyset\). Construct \(N = (Q, \Sigma, \delta, q_0, F)\) where
\(Q =\)
\(q_0 =\)
\(F =\)
\(\delta: Q \times \Sigma_\varepsilon \to \mathcal{P}(Q)\) is defined by, for \(q \in Q\) and \(a \in \Sigma_{\varepsilon}\): \[\delta((q,a))=\begin{cases} \delta_1 ((q,a)) &\qquad\text{if } q\in Q_1 \textrm{ and } q \notin F_1\\ \delta_1 ((q,a)) &\qquad\text{if } q\in F_1 \textrm{ and } a \in \Sigma\\ \delta_1 ((q,a)) \cup \{q_2\} &\qquad\text{if } q\in F_1 \textrm{ and } a = \varepsilon\\ \delta_2 ((q,a)) &\qquad\text{if } q\in Q_2 \end{cases}\]
Proof of correctness would prove that \(L(N) = A_1 \circ A_2\) by considering an arbitrary string accepted by \(N\), tracing an accepting computation of \(N\) on it, and using that trace to prove the string can be written as the result of concatenating two strings, the first in \(A_1\) and the second in \(A_2\); then, taking an arbitrary string in \(A_1 \circ A_2\) and proving that it is accepted by \(N\). Details left for extra practice.
Suppose \(A\) is a language over an alphabet \(\Sigma\). Claim: if there is a NFA \(N\) such that \(L(N) = A\), then there is another NFA, let’s call it \(N'\), such that \(L(N') = A^*\).
Proof idea: Add a fresh start state, which is an accept state. Add spontaneous moves from each (old) accept state to the old start state.
Formal construction: Let \(N = (Q, \Sigma, \delta, q_1, F)\) and assume \(q_0 \notin Q\). Construct \(N' = (Q', \Sigma, \delta', q_0, F')\) where
\(Q' = Q \cup \{q_0\}\)
\(F' = F \cup \{q_0\}\)
\(\delta': Q' \times \Sigma_\varepsilon \to \mathcal{P}(Q')\) is defined by, for \(q \in Q'\) and \(a \in \Sigma_{\varepsilon}\): \[\delta'((q,a))=\begin{cases} \delta ((q,a)) &\qquad\text{if } q\in Q \textrm{ and } q \notin F\\ \delta ((q,a)) &\qquad\text{if } q\in F \textrm{ and } a \in \Sigma\\ \delta ((q,a)) \cup \{q_1\} &\qquad\text{if } q\in F \textrm{ and } a = \varepsilon\\ \{q_1\} &\qquad\text{if } q = q_0 \textrm{ and } a = \varepsilon \\ \emptyset &\qquad\text{if } q = q_0 \textrm { and } a \in \Sigma \end{cases}\]
Proof of correctness would prove that \(L(N') = A^*\) by considering an arbitrary string accepted by \(N'\), tracing an accepting computation of \(N'\) on it, and using that trace to prove the string can be written as the result of concatenating some number of strings, each of which is in \(A\); then, taking an arbitrary string in \(A^*\) and proving that it is accepted by \(N'\). Details left for extra practice.
Application: A state diagram for a NFA over \(\Sigma = \{a,b\}\) that recognizes \(L (( a^*b)^* )\):
Suppose \(A\) is a language over an alphabet \(\Sigma\). Claim: if there is a NFA \(N\) such that \(L(N) = A\) then there is a DFA \(M\) such that \(L(M) = A\).
Proof idea: States in \(M\) are “macro-states" – collections of states from \(N\) – that represent the set of possible states a computation of \(N\) might be in.
Formal construction: Let \(N = (Q, \Sigma, \delta, q_0, F)\). Define \[M = (~ \mathcal{P}(Q), \Sigma, \delta', q', \{ X \subseteq Q \mid X \cap F \neq \emptyset \}~ )\] where \(q' = \{ q \in Q \mid \text{$q = q_0$ or is accessible from $q_0$ by spontaneous moves in $N$} \}\) and \[\delta' (~(X, x)~) = \{ q \in Q \mid q \in \delta( ~(r,x)~) ~\text{for some $r \in X$ or is accessible from such an $r$ by spontaneous moves in $N$} \}\]
Consider the state diagram of an NFA over \(\{a,b\}\). Use the “macro-state” construction to find an equivalent DFA.
Consider the state diagram of an NFA over \(\{0,1\}\). Use the “macro-state” construction to find an equivalent DFA.
Note: We can often prune the DFAs that result from the “macro-state” constructions to get an equivalent DFA with fewer states (e.g. only the “macro-states" reachable from the start state).
The class of regular languages
Fix an alphabet \(\Sigma\). For each language \(L\) over \(\Sigma\):
There is a DFA over \(\Sigma\) that recognizes \(L\) | \(\exists M ~(M \textrm{ is a DFA and } L(M) = A)\) |
if and only if | |
There is a NFA over \(\Sigma\) that recognizes \(L\) | \(\exists N ~(N \textrm{ is a NFA and } L(N) = A)\) |
if and only if | |
There is a regular expression over \(\Sigma\) that describes \(L\) | \(\exists R ~(R \textrm{ is a regular expression and } L(R) = A)\) |
A language is called regular when any (hence all) of the above three conditions are met.
We already proved that DFAs and NFAs are equally expressive. It remains to prove that regular expressions are too.
Part 1: Suppose \(A\) is a language over an alphabet \(\Sigma\). If there is a regular expression \(R\) such that \(L(R) = A\), then there is a NFA, let’s call it \(N\), such that \(L(N) = A\).
Structural induction: Regular expression is built from basis regular expressions using inductive steps (union, concatenation, Kleene star symbols). Use constructions to mirror these in NFAs.
Application: A state diagram for a NFA over \(\{a,b\}\) that recognizes \(L(a^* (ab)^*)\):
Part 2: Suppose \(A\) is a language over an alphabet \(\Sigma\). If there is a DFA \(M\) such that \(L(M) = A\), then there is a regular expression, let’s call it \(R\), such that \(L(R) = A\).
Proof idea: Trace all possible paths from start state to accept state. Express labels of these paths as regular expressions, and union them all.
Add new start state with \(\varepsilon\) arrow to old start state.
Add new accept state with \(\varepsilon\) arrow from old accept states. Make old accept states non-accept.
Remove one (of the old) states at a time: modify regular expressions on arrows that went through removed state to restore language recognized by machine.
Application: Find a regular expression describing the language recognized by the DFA with state diagram
Definition and Theorem: For an alphabet \(\Sigma\), a language \(L\) over \(\Sigma\) is called regular exactly when \(L\) is recognized by some DFA, which happens exactly when \(L\) is recognized by some NFA, and happens exactly when \(L\) is described by some regular expression
We saw that: The class of regular languages is closed under complementation, union, intersection, set-wise concatenation, and Kleene star.
Prove or Disprove: There is some alphabet \(\Sigma\) for which there is some language recognized by an NFA but not by any DFA.
Prove or Disprove: There is some alphabet \(\Sigma\) for which there is some finite language not described by any regular expression over \(\Sigma\).
Prove or Disprove: If a language is recognized by an NFA then the complement of this language is not recognized by any DFA.
Fix alphabet \(\Sigma\). Is every language \(L\) over \(\Sigma\) regular?
Set | Cardinality |
---|---|
\(\{0,1\}\) | |
\(\{0,1\}^*\) | |
\(\mathcal{P}( \{0,1\})\) | |
The set of all languages over \(\{0,1\}\) | |
The set of all regular expressions over \(\{0,1\}\) | |
The set of all regular languages over \(\{0,1\}\) | |
Strategy: Find an invariant property that is true of all regular languages. When analyzing a given language, if the invariant is not true about it, then the language is not regular.
Pumping Lemma (Sipser Theorem 1.70): If \(A\) is a regular language, then there is a number \(p\) (a pumping length) where, if \(s\) is any string in \(A\) of length at least \(p\), then \(s\) may be divided into three pieces, \(s = xyz\) such that
\(|y| > 0\)
for each \(i \geq 0\), \(xy^i z \in A\)
\(|xy| \leq p\).
Proof illustration
True or False: A pumping length for \(A = \{ 0,1 \}^*\) is \(p = 5\).
Recap so far: In DFA, the only memory available is in the states. Automata can only “remember” finitely far in the past and finitely much information, because they can have only finitely many states. If a computation path of a DFA visits the same state more than once, the machine can’t tell the difference between the first time and future times it visits this state. Thus, if a DFA accepts one long string, then it must accept (infinitely) many similar strings.
Definition A positive integer \(p\) is a pumping length of a language \(L\) over \(\Sigma\) means that, for each string \(s \in \Sigma^*\), if \(|s| \geq p\) and \(s \in L\), then there are strings \(x,y,z\) such that \[s = xyz\] and \[|y| > 0, \qquad \qquad \text{ for each $i \geq 0$, $xy^i z \in L$}, \qquad \text{and} \qquad \qquad |xy| \leq p.\]
Negation: A positive integer \(p\) is not a pumping length of a language \(L\) over \(\Sigma\) iff \[\exists s \left(~ |s| \geq p \wedge s \in L \wedge \forall x \forall y \forall z \left( ~\left( s = xyz \wedge |y| > 0 \wedge |xy| \leq p~ \right) \to \exists i ( i \geq 0 \wedge xy^iz \notin L ) \right) ~\right)\] Informally:
Restating Pumping Lemma: If \(L\) is a regular language, then it has a pumping length.
Contrapositive: If \(L\) has no pumping length, then it is nonregular.
The Pumping Lemma cannot be used to prove that a language is regular.
The Pumping Lemma can be used to prove that a language is not regular.
Extra practice: Exercise 1.49 in the book.
Proof strategy: To prove that a language \(L\) is not regular,
Consider an arbitrary positive integer \(p\)
Prove that \(p\) is not a pumping length for \(L\)
Conclude that \(L\) does not have any pumping length, and therefore it is not regular.
Example: \(\Sigma = \{0,1\}\), \(L = \{ 0^n 1^n \mid n \geq 0\}\).
Fix \(p\) an arbitrary positive integer. List strings that are in \(L\) and have length greater than or equal to \(p\):
Pick \(s =\)
Suppose \(s = xyz\) with \(|xy| \leq p\) and \(|y| > 0\).
Then when \(i = \hspace{1in}\), \(xy^i z = \hspace{1in}\)
Example: \(\Sigma = \{0,1\}\), \(L = \{w w^{\mathcal{R}} \mid w \in \{0,1\}^*\}\). Remember that the reverse of a string \(w\) is denoted \(w^\mathcal{R}\) and means to write \(w\) in the opposite order, if \(w = w_1 \cdots w_n\) then \(w^\mathcal{R} = w_n \cdots w_1\). Note: \(\varepsilon^\mathcal{R} = \varepsilon\).
Fix \(p\) an arbitrary positive integer. List strings that are in \(L\) and have length greater than or equal to \(p\):
Pick \(s =\)
Suppose \(s = xyz\) with \(|xy| \leq p\) and \(|y| > 0\).
Then when \(i = \hspace{1in}\), \(xy^i z = \hspace{1in}\)
Example: \(\Sigma = \{0,1\}\), \(L = \{0^j1^k \mid j \geq k \geq 0\}\).
Fix \(p\) an arbitrary positive integer. List strings that are in \(L\) and have length greater than or equal to \(p\):
Pick \(s =\)
Suppose \(s = xyz\) with \(|xy| \leq p\) and \(|y| > 0\).
Then when \(i = \hspace{1in}\), \(xy^i z = \hspace{1in}\)
Example: \(\Sigma = \{0,1\}\), \(L = \{0^n1^m0^n \mid m,n \geq 0\}\).
Fix \(p\) an arbitrary positive integer. List strings that are in \(L\) and have length greater than or equal to \(p\):
Pick \(s =\)
Suppose \(s = xyz\) with \(|xy| \leq p\) and \(|y| > 0\).
Then when \(i = \hspace{1in}\), \(xy^i z = \hspace{1in}\)
Extra practice:
Language | \(s \in L\) | \(s \notin L\) | Is the language regular or nonregular? |
---|---|---|---|
\(\{a^nb^n \mid 0 \leq n \leq 5 \}\) | |||
\(\{b^n a^n \mid n \geq 2\}\) | |||
\(\{a^m b^n \mid 0 \leq m\leq n\}\) | |||
\(\{a^m b^n \mid m \geq n+3, n \geq 0\}\) | |||
\(\{b^m a^n \mid m \geq 1, n \geq 3\}\) | |||
\(\{ w \in \{a,b\}^* \mid w = w^\mathcal{R} \}\) | |||
\(\{ ww^\mathcal{R} \mid w\in \{a,b\}^* \}\) | |||