COG 376 Assignment One - Due March 18th (IF YOU NEED THE EXTRA
TIME, YOU CAN HAND IT IN MARCH 28th; FINAL DEADLINE MIDNIGHT MARCH
30th)
1) In either the book, "Natural Language
Processing with Python", Chapter One, or this
website:
http://www.nltk.org/book/ch01.html
do
the following problems found at the end of the chapter: 4, 5, 8, 10,
15, 25, 27
ALSO
Review the code in section 3.1, run the first example of code
under the heading
"Frequency Distributions" with a text
other than text1. SKIP THE fdist1.most_common(50) command!
Hand in the result.
2) In either the book, "Natural Language
Processing with Python", Chapter Two, or this
website:
http://www.nltk.org/book/ch02.html
do
the following problems found at the end of the chapter: 1, 2, 4, 9
Please note that the methods for solving problem one can be found
in Chapter One - not exclusively in Chapter Two as the authors
imply.
NOTE: BIG HINT FOR NUMBER FOUR ABOVE:
Use
a function like the following for
counting:
state_union.words('2000-Clinton.txt').count('men')
AND please import the state_union corpus!
3)
In either the book, "Natural Language Processing with Python",
Chapter Three, or this
website:
http://www.nltk.org/book/ch03.html
read the section on regular expressions, then:
CHOOSE
THREE OF THE FOLLOWING:
☼ Describe the class (or set) of strings matched by the following regular expressions.
[a-zA-Z]+
[A-Z][a-z]*
p[aeiou]{,2}t
\d+(\.\d+)?
([^aeiou][aeiou][^aeiou])*
\w+|[^\w\s]+
(This is question number 6 found in the "Exercises section",
3.12.)
NOTE: IF YOU RUN OUT OF TIME, YOU DON'T NEED TO
"TEST" YOUR ANSWERS FOR THIS QUESTION.