COG 376 Assignment One - Due March 18th (IF YOU NEED THE EXTRA TIME, YOU CAN HAND IT IN MARCH 28th; FINAL DEADLINE MIDNIGHT MARCH 30th)

1) In either the book, "Natural Language Processing with Python", Chapter One,  or this website:


http://www.nltk.org/book/ch01.html


do  the following problems found at the end of the chapter: 4, 5, 8, 10, 15, 25, 27

ALSO

Review the code in section 3.1, run the first example of code under the heading
"Frequency Distributions" with a text other than text1. SKIP THE fdist1.most_common(50) command!

Hand in the result.


2) In either the book, "Natural Language Processing with Python", Chapter Two,  or this website:


http://www.nltk.org/book/ch02.html

do the following problems found at the end of the chapter: 1, 2, 4, 9


Please note that the methods for solving problem one can be found in Chapter One - not exclusively in Chapter Two as the authors imply.

NOTE: BIG HINT FOR NUMBER FOUR ABOVE:


Use a function like the following for counting:

state_union.words('2000-Clinton.txt').count('men')

AND please import the state_union corpus!


3) In either the book, "Natural Language Processing with Python", Chapter Three,  or this website:

http://www.nltk.org/book/ch03.html read the section on regular expressions, then:


CHOOSE THREE OF THE FOLLOWING:

Describe the class (or set) of strings matched by the following regular expressions.

  1. [a-zA-Z]+

  2. [A-Z][a-z]*

  3. p[aeiou]{,2}t

  4. \d+(\.\d+)?

  5. ([^aeiou][aeiou][^aeiou])*

  6. \w+|[^\w\s]+

(This is question number 6 found in the "Exercises section", 3.12.)


NOTE: IF YOU RUN OUT OF TIME, YOU DON'T NEED TO "TEST" YOUR ANSWERS FOR THIS QUESTION.