Parsing

There are two broad categories of parsing:

plus various schemes combining elements of each.

Top Down Parsing

The top-down approach attempts to construct the parse tree (i.e., do the parse or find a derivation) using the start symbol of the grammar as a beginning point.

This implies backtracking.

Problems

Bottom Up

The tree is constructed from the leaves on ``up to the root''.

The epitome model is that of shift-reduce parsing.

Algorithm:

repeat
  shift a token onto the "stack"
  if stack contains a recognizable "structure" (string of tokens)
     reduce these to a LHS
  endif
until stack contains only the start symbol

Example Grammar 1


   S -> NP VP
   NP -> D N
   VP -> V
   D -> a
   D -> the
   D -> this
   N -> dog
   N -> cat
   V -> barks
   V -> meows

Sample parse of ``a dog barks''.

   _  - a dog barks
   a  - dog barks
   D  - dog barks
   D dog  - barks
   D N  - barks
   NP  - barks
   NP barks  -
   NP V  -
   NP VP  -
   S  -

Problems arise when the decision whether to shift or reduce can not be made - leading to either a

   S -> NP VP
   NP -> D N
   VP -> V
   VP -> V PP
   D -> a
   D -> the
   D -> this
   N -> dog
   N -> cat
   N -> alley
   V -> barks
   V -> meows
   PP -> P NP
   P -> in

Sample parse of ``a dog barks in the alley'':

   _  - a dog barks in the alley
   a  - dog barks in the alley
   D  - dog barks in the alley
   D dog  - barks in the alley
   D N  - barks in the alley
   NP  - barks in the alley
   NP barks  - in the alley
   NP V  - in the alley
                         <-- s/r here
reducing:

   NP VP  - in the alley

   S  -  in the alley     <-- some input left

shifting:

   NP V in - the alley
   NP V P - the alley
   NP V P the - alley
   NP V P D - alley
   NP V P D alley -
   NP V P D N -
   NP V P NP -
   NP V PP -
   NP VP -
   S -

Note that in natural language, recognizing (accepting) often is done (correctly or incorrectly) when some input remains to be processed.

Generating reduce/reduce conflicts:

Let's add the rule:

   VP -> P D N

(perhaps trying to simplistically capture the sentence:
the dog in the alley.

Which answers the question who barks?
and uses a form of ellipsis: the dog in the alley barks.

   S -> NP VP
   NP -> D N
   VP -> V
   VP -> V PP
   VP -> P D N
   D -> a
   D -> the
   D -> this
   N -> dog
   N -> cat
   N -> alley
   V -> barks
   V -> meows
   PP -> P NP
   P -> in

Parsing:

   _  - the dog in the alley
   the  - dog in the alley
   D  - dog in the alley
   D dog  - in the alley
   D N  - in the alley
   NP  - in the alley
   NP in  - the alley
   NP P  - the alley
   NP P the  - alley
   NP P D  - alley
   NP P D alley -
   NP P D N -

Which could reduce:

   NP P NP -
   NP PP -

but is now stuck.

Or it could be reduced via

   NP VP -
   S -

which is accepted.