Computational Linguistics — Finite Automata
DFA [Wikipedia]
A deterministic finite automaton is a 5-tuple,
M = (Q,A,δ,q0,F)
Q = set of states
A = input alphabet
δ = transition function Q x A -> Q
q0 = initial or start state
F = set of final states
Matt Foley Pictorial Example:
____
---- a | ---- | b ----
| |--------->|| ||--------->| |
---->| s0 | || s1 || | s2 |
---- | ---- |<--------- ----
^ | ---- a |
a| |b | b|
| V | |
---- |a V
| |_____ | ---- _______
| s4 | | b ------------->| | | a,b
---- <---- | s3 |<------
----
Q = {s0, s1, s2, s3, s4}
A = {a,b}
q0 = s0
F = {s1}
δ: Q x A -> Q
Q | a | b -------|--------|---------- s0 | s1 | s4 -------|--------|---------- s1 | s3 | s2 -------|--------|---------- s2 | s1 | s3 -------|--------|---------- s3 | s3 | s3 -------|--------|---------- s4 | s0 | s4
Extending δ to "operate" on strings.
Given dfa M with transition function d, define d' as:
δ' : Q x A* -> Q
δ'(q,λ) = q - on null input stays in original state
δ'(q,a) = δ(q,a) if a in A
δ'(q,w) = δ'(δ(q,a),x) if w = ax for a in A, x in A*
This is called the extended transition function.
The language accepted by M, or the language of M is denoted L(M) and is defined to be:
The set of all strings from A* such that δ' applied to (the start state,the string) is an element of the set of final states.
That is, given dfa M = (Q,A,δ,q0,F) ,
L(M) = { s | s in A* and δ'(q0,s) in F }
Test the following strings on the above example to see whether each is in L(M).
baab
baabb
baba
babaaba
Note that a finite automaton has "no memory" of how it got into a certain state. Once a symbol has been read, and the machine advances to a new state, it can't know what put it in that state.
This brings up the concept of "instantaneous machine description" (or machine configuration):
[q,w] which denotes the state and remaining input
Process the previous example, showing each i.d.
baab
baabb
baba
babaaba
Aside: there is typically defined a "function", '|-', such that:
byQ x A+ → Q
[qi,aw] |- [δ(qi,a),w].
The significance of this "notational" convenience is that it bears resemblance to derivations used in grammars, and to the "models" relationship used in logic.
More Definitions:
State diagram:
Labeled digraph, G, such that:
there is arc (p,q) if &delta(p,a) = q for some a in A
It can be shown that for any regular expression one can construct a dfa to recognize the same language. It is also true that the language of any dfa can be represented by a regular expression.
Thus regular expressions and dfa are equivalent ways of specifying languages.
Match the automaton with the language it recognizes.
A. (a+b)a*
B. (a+b)(a*(bb*a)*)*
C. b(a*)
D. (b+a(b*)a)(a+b)*
____
---- a,b | ---- | b ---- _______
_____ 1) | |--------->|| ||--------->| | | b
| s0 | || s1 || | s2 |<------
---- | ---- |<--------- ----
---- a
| ^
|_a_|
-----------------a---------------
| |
| ____ V
---- b | ---- | a ---- _______
_____ 2) | |--------->|| ||<---------| | |b
| s0 | || s1 || | s2 |<------
---- | ---- | ----
----
| ^
|___|
a,b
____
---- a,b | ---- | b ---- _______
_____ 3) | |--------->|| ||--------->| | | a,b
| s0 | || s1 || | s2 |<------
---- | ---- | ----
----
| ^
|_a_|
-----------------a---------------
| |
| ____ V
---- b | ---- | b ---- _______
_____ 4) | |--------->|| ||--------->| | | a,b
| s0 | || s1 || | s2 |<------
---- | ---- | ----
----
| ^
|_a_|