Computational Linguistics — Finite Automata

DFA [Wikipedia]

A deterministic finite automaton is a 5-tuple,

M = (Q,A,δ,q0,F)

where
Q = set of states
A = input alphabet
δ = transition function Q x A -> Q
q0 = initial or start state
F = set of final states

Matt Foley Pictorial Example:

                             ____ 
            ----   a       | ---- |    b      ---- 
           |    |--------->||    ||--------->|    |
      ---->| s0 |          || s1 ||          | s2 |
            ----           | ---- |<--------- ----
            ^ |              ----       a       |
           a| |b              |                b|
            | V               |                 |
            ----              |a                V
           |    |_____        |               ---- _______
           | s4 |     | b      ------------->|    |       | a,b
            ---- <----                       | s3 |<------
                                              ----

Q = {s0, s1, s2, s3, s4}
A = {a,b}
q0 = s0
F = {s1}
δ: Q x A -> Q
   Q   |   a    |     b
-------|--------|----------
   s0  |   s1   |     s4
-------|--------|----------
   s1  |   s3   |     s2
-------|--------|----------
   s2  |   s1   |     s3
-------|--------|----------
   s3  |   s3   |     s3
-------|--------|----------
   s4  |   s0   |     s4


Extending δ to "operate" on strings.

Given dfa M with transition function d, define d' as:

δ' : Q x A* -> Q
δ'(q,λ) = q - on null input stays in original state
δ'(q,a) = δ(q,a) if a in A
δ'(q,w) = δ'(δ(q,a),x) if w = ax for a in A, x in A*

This is called the extended transition function.


The language accepted by M, or the language of M is denoted L(M) and is defined to be:

The set of all strings from A* such that δ' applied to (the start state,the string) is an element of the set of final states.

That is, given dfa M = (Q,A,δ,q0,F) ,

L(M) = { s | s in A* and δ'(q0,s) in F }

Test the following strings on the above example to see whether each is in L(M).

    baab
    baabb
    baba
    babaaba


Note that a finite automaton has "no memory" of how it got into a certain state. Once a symbol has been read, and the machine advances to a new state, it can't know what put it in that state.

This brings up the concept of "instantaneous machine description" (or machine configuration):


    [q,w]    which denotes the state and remaining input

Process the previous example, showing each i.d.

    baab
    baabb
    baba
    babaaba


Aside: there is typically defined a "function", '|-', such that:


   Q x A+ → Q
by

[qi,aw] |- [δ(qi,a),w].

The significance of this "notational" convenience is that it bears resemblance to derivations used in grammars, and to the "models" relationship used in logic.


More Definitions:

State diagram:

Labeled digraph, G, such that:


It can be shown that for any regular expression one can construct a dfa to recognize the same language. It is also true that the language of any dfa can be represented by a regular expression.

Thus regular expressions and dfa are equivalent ways of specifying languages.

Match the automaton with the language it recognizes.

       A. (a+b)a*
       B. (a+b)(a*(bb*a)*)*
       C. b(a*)
       D. (b+a(b*)a)(a+b)*


                                   ____ 
                  ----   a,b     | ---- |    b      ---- _______
 _____   1)      |    |--------->||    ||--------->|    |       | b
                 | s0 |          || s1 ||          | s2 |<------
                  ----           | ---- |<--------- ----
                                   ----       a
                                   |   ^
                                   |_a_|

                    -----------------a---------------
                   |                                 |
                   |               ____              V
                  ----   b       | ---- |    a      ---- _______
 _____   2)      |    |--------->||    ||<---------|    |       |b
                 | s0 |          || s1 ||          | s2 |<------
                  ----           | ---- |           ----
                                   ----
                                   |   ^
                                   |___|
                                    a,b

                                   ____ 
                  ----    a,b    | ---- |    b      ---- _______
 _____   3)      |    |--------->||    ||--------->|    |       | a,b
                 | s0 |          || s1 ||          | s2 |<------
                  ----           | ---- |           ----
                                   ----
                                   |   ^
                                   |_a_|

                    -----------------a---------------
                   |                                 |
                   |               ____              V
                  ----    b      | ---- |    b      ---- _______
 _____   4)      |    |--------->||    ||--------->|    |       | a,b
                 | s0 |          || s1 ||          | s2 |<------
                  ----           | ---- |           ----
                                   ---- 
                                   |   ^
                                   |_a_|