Computational Linguistics — Overview
What is the course (viz., computational linguistics) about?
Short answer: Computer recognition of (natural) languages.
Extended explanation:
• Computer recognition could be replaced by algorithmic recognition, as computers are not required to analyze recognition methodologies.
• Recognition normally implies translation.
• Linguists are typically interested in natural languages, but the use of subsets is necessary to make the problem tractable1.
• Other forms of communication are of interest to the computational linguist. (Speech recognition2, pattern recognition3, sign language4, input/output devices5 for managing disabilities, etc.)
1, 2, 3, 4, 5 areas of intense interest to linguistics researchers:
1 Wikipedia entry related to tractability.
2Speech recognition has become "ad hoc" doable and with increasing parallelism may become "solvable".
3The general problem of pattern recognition is even more difficult than that of speech recognition.
Goals
Topic overview
Wikipedia entry on Computational Linguistics. — also an assigned reading.
Why would Inf. Sci./Comp. Sci. students want to study this subject?
Think of some applications you've encountered in your studies.7
Work begun in late 1950's.
Led to realization that much "world knowledge" is required to perform a translation. Lack of visible progress led to funding cutback.
This term has a different meaning to Computer/Information scientists than to linguists.
Specifically, information from natural language texts. Because domains are quite complex, little success. But enhanced understanding of knowledge representation.
Essentially, natural language input to retrieval systems.
Give examples of man-machine interfaces.8
Because of the realization that translation is not just a manipulation of symbols, the need to develop systems with more "complete understanding" has extended the field:
To demonstrate how elusive a knowledge representation is, consider some sets of favorite sentences of linguists:
More on machine translation.
Original methodology:
SOURCE TEXT ---> TARGET LANGUAGE TRANSLATION
But, consider:
Navajo Indians have different verbs for "picking up
round thin objects" and another for "picking up long
flexible objects";
Hopi have no noun for time and no verb tenses for past,
present, and future.
Or famous Jimmy Carter faux pas:
Meant to say: "... the American people have great desires
(hopes) for the Polish people ..."
Instead, translated to: "... the American people lust
after the Polish people ..."
New methodology:
SOURCE TEXT ---> MEANING OF SOURCE TEXT ---> TARGET LANGUAGE TRANSLATION
Terms/Concepts
This pre-supposes:
Some linguists claim there is an underlying "form", a grammar, which defines language.
Others say there is an underlying "substance", which we might try to organize and describe with one or many grammars.
One would like to design computer programs which are masters of the system (human language) that they are trying to emulate. Mostly we settle for performance, viz., systems that appear to interact in a humanly fashion, but which are really just using tricks.
This somewhat contradicts the notion that language has a built-in form.
The debate as to whether language is fundamentally discrete vs. continuous is regarding form. It seems hard to ascribe pure digital behavior in regards to semantics.
E.g., "This turkey takes 5 hours to cook in a 450 degree oven. So put it in an oven at 900 degrees for 2 1/2 hours." This "almost" makes sense to people. In general, jokes and analogies appear to have subtle shadings of meaning.One could argue that this might also apply to syntax. When a speaker says "Errr..." some might interpret this as - "thought", "retrieval of information", "lack of knowledge", "embarrassment", etc.
[but note how strings of words often can be rearranged with retention of meaning - strings of sounds usually can't be]
Language and the Brain
Some ask questions like, "Why is there a science of linguistics and not of checkerology?"
Synopses of possible answers:
Some answer this by stating that the brain is structured to develop language but it is not structured to play checkers.There is a large group with the differing view, that a brain has various functional components, some of which can be adapted to language.
Phonology - sound
Computational Linguistics
| |
| |
Language Analysis Language Generation
| |
| |
Sentence Analysis Discourse Structure and Analysis
| |
| |
Syntax Analysis Semantic Analysis
[note that the "language generation" subtree can be mirror image
of "language analysis" subtree.]
There are less general tools to describe simple (non-natural) languages' syntax. These include
automata
and
regular expressions.
Semantics - meaning of a sentence
Meaning is imparted on a sentence both from the constituent elements and their locations within the sentence:
She is not the only one with an axe to grind.
Only she is not the one with an axe to grind.
Individual meanings may not add up to the sum total of meaning of a sentence:
The green-eyed monster drove her insane.
And individual meaning may be altered by other aspects of the language (idiom, hyperbole):
I could care less.
I could eat a horse.
Some words can not meaningfully appear together, even though
syntactically correct:
Green rocks sleep furiously.
[Student Opportunity - display more examples.]
Included in semantics are the ideas of "sense" (meaning) and "reference" (the actual entity referred to):
The hostess was a blathering fool by night's end.
(We can tell that "hostess" is used correctly in semantics without knowing to whom it refers - or even if it is a true statement.)
[Student Opportunity - display more examples.]
Related concepts
E.g., "Can you close the door?"
E.g., "The King of France is Bald."
Presupposition is related to "algorithmic encoding content", a measure of how much information is contained in a message. What appears to be an efficient encoding may rely on copious amounts of implicit knowledge.
E.g., in 'C' (and Java), a for loop looks like:
for(int i = 0; i < 10; i++) { ... }
What are the semantics?
What happens if a loop is written as:
for(int i = 0; ; i++) { ... }
What about?
for(int i=0; i <= 10; i = i++) { ... }
E.g.,
"John Travolta is a genius."Or
vs.
"In the movie Phenomenon, John Travolta is a genius."
"That guy is good-looking.Be able to define and illustrate with examples not used in class.
vs.
"That guy is good-looking, for an ugly guy."(I polite way to say the above is, "That guy has off-beat good looks.")