Dialogues

C: I want you to tell me the names of the fellows on the St. Louis team.
A: I'm telling you. Who's on first, What's on second, I Don't Know is on third.
C: You know the fellows' names?
A: Yes.
C: Well, then, who's playing first?
A: Yes.
C: I mean the fellow's name on first.
A: Who.
C: The guy on first base.
A: Who is on first.
C: Well what are you askin' me for?
A: I'm not asking you — I'm telling you. Who is on first.
Who's on First — Bud Abbott and Lou Costello's version of an old burlesque standard. [JM]
[ walken version ]

Dialogue is characterized by turn-taking; Speaker A says something, then speaker B, then speaker A, and so on.

Having a turn (or "taking the floor") is a resource to be allocated; what are the processes involved in this allocation? How do speakers know when it is the proper time to contribute their turn?

It appears that conversation and language itself are structured in such a way as to deal efficiently with this resource allocation problem. One source of evidence for this is the timing of the utterances in normal human conversations. [JM]

Turn-taking behavior is generally studied in the field of Conversation Analysis (CA). Sacks et al. (1974) argued that turn-taking behavior is governed by a set of "turn-taking rules." These rules apply at a transition-relevance place, or TRP; places where the structure of the language allows speaker shift to occur.

A version of the turn-taking rules simplified from Sacks et al. (1974) by Jurafsky [JM]
Example Definition: At each TRP of each turn:

  1. If during this turn the current speaker has selected A as the next speaker then A must speak next.
  2. If the current speaker does not select the next speaker, any other speaker may take the next turn.
  3. If no one else takes the next turn, the current speaker may take the next turn.

Rule (a) implies that there are some utterances by which the speaker specifically selects who the next speaker will be.

The rules imply that transitions between speakers don't occur just anywhere; the transition-relevance places where they tend to occur are generally at utterance boundaries.

The term speech act is generally used to describe illocutionary acts rather than either of the other two types of acts specified by J. Austin in 1962. [JA]:

  1. locutionary act — utterances performed for their meaning
  2. illocutionary act — utterances performed for their effects;
    e.g., asking a question.
  3. perlocutionary act — process of persuading (scaring, intimidating) to affect the hearer's actions or beliefs

Searle (1975b) suggested a modified taxonomy with all speech acts classified into one of five major classes:

  1. Assertives: committing the speaker to something as being the case (suggesting, putting forward, swearing, boasting, concluding).
  2. Directives: attempts by the speaker to get the addressee to do something (asking, ordering, requesting, inviting, advising, begging).
  3. Commissives: committing the speaker to some future course of action (promising, planning, vowing, betting, opposing).
  4. Expressives: expressing the psychological state of the speaker about a state of affairs thanking, apologizing, welcoming, deploring.
  5. Declarations: bringing about a different state of the world via the utterance (including many examples of Austin's performative type: I resign, You're fired.)

Component architecture of a conversational agent [JM]

    -------------------        ------------------------
--> |Speech recognition| -->  |Nat. Lang. understanding| ---.
    |__________________|      |________________________|    |
                                                            V
                                                  ----------------         ------------.
                                                 |dialogue manager| <---> |task manager|
                                                 |________________|       |____________|
    _______________      _______________                   |                            
<-- |text-to-speech| <-- |Nat. Lang.    | <---------------- 
    |  synthesis   |     | generation   |
     ---------------      -------------- 
Architecture of a generator portion of a dialogue system [Walker and Rambow 2002].


what to say                 How to say it                    
------------      ---------------------------------------  
   content          Sentence      Surface      Prosody              Speech
   planner   -->    Planner  ->   Realizer ->  Assigner    -->     Synthesizer