Cognitive Studies/Psychology/Computer Science 201:
COGNITIVE SCIENCE IN CONTEXT

Natural Language Processing Module
(Part of Speech Tagging and Sentence Parsing)
Laboratory Manual

Yonghong Mao

10-22-97

New: Added link to MIT Machine Understanding Group (10/21/98)


Components


INTRODUCTION

Natural Language Processing (NLP) is a cognitive science subdiscipline drawing on linguistics, computer science, and psycholinguistics. Recent strides in this area have already begun to influence how human beings interact with computers. The following scenario is only a few commercial products away: You come home and are warmly greeted by your computer assistant. You ask, "What's the news today?" and your computer assistant gives you a verbal report based on a range of interests you have specified. If you don't want more news, you just say " OK, that's enough - how did the stock market do today?" As you lie on the couch, you tell your computer assistant to play music, put on a video and so on, ask to be woken tomorrow morning at 6:15, and have your assistant remind you of something you may forget to do. What's more, you might do all this in Chinese or English, or both.

None of this is a wild idea. "We'll see speech command capabilities. We'll see speech dictation capabilities"(Bill Gates, 1994). Existing products which realize this kind of capability include Dragon System's NaturallySpeaking, IBM's Viavoice, Digital Speech Recognition Software V1.0 for Digital UNIX, and Philips' FreeSpeech 98. It's also being becoming more closely integrated into computer operating systems like Windows and Mac OS. As you will learn in this module, however, natural language understanding is an extremely complicated problem - and is only one of the subareas of the NLP field. In this module, you will try out some approaches to solving basic problems that most NLP applications must face. You will see the advantages and disadvantages of particular approaches. You will find out what factors affect the performance of the system and have an opportunity to make your own suggestions about ways to improve it. There are two lab topics in this module, which will be explained in more detail shortly:

(I) part of speech tagging
(II) sentence parsing and PP attachment.

Your tasks will involve:

1) playing with the software accessible from this web page in order to gain an understanding of the underlying algorithms;

2) explaining why the algorithms work in some cases and fail in other cases;

3) comparing these algorithms with the way you think human beings process natural language and providing support for the conclusions you draw.

The exercises below will provide you with specific ways to undertake these three main tasks.

I. PART OF SPEECH TAGGING

Part of speech tagging is the task of assigning an appropriate part of speech or word category (e.g. Noun, Verb, Adjective,...) to each word in a sentence. This is usually taken to be the first step in automatically processing language at the sentence level. Its output is crucial for tasks such as sentence parsing (assigning a structure and/or meaning to natural language input) and word sense disambiguation (distinguishing words with the same spelling and/or pronunciation). The two major approaches to this task are rule-based approaches (Brill 1994) and statistical approaches (Jelinek 1985; DeMarcken 1990; Merialdo 1991; Cutting et al. 1992). Brill's tagger is one of the best rule-based tagging systems. It is a transformation-based error-driven model which learns a set of rules automatically based on a given corpus and then tags words following these rules. The statistical tagger you will use is based on a hidden Markov model, which computes probabilities of co-occurrence of words based on a given tagged corpus and then tags texts using these probabilities.

Purpose

Through observing the performances of these taggers and finishing the following tasks, you should develop a deeper understanding of the two different perspectives on NLP: the symbolic rule-based perspective and the statistical perspective.

Reading Assignment 1

James Allen. 1995. Natural Language Understanding. The Benjamin/Cummings Publishing Company, 2nd ed. Chapter 1, pp. 1-20.

Eric Brill. Transformation-Based Error-Driven Learning and Natural Language Processing: A Case Study in Part-of-Speech Tagging. Computational Linguistics, Dec. 1995

Note: To view the Brill paper on-line, you need to have the (free) Adobe Acrobat Reader installed. This can be downloaded from Adobe.

Lab Assignment 1

Explore Brill's Tagger and the Statistical Tagger by following the Procedure in Steps 1 through 12 below. This will give you a thorough opportunity to compare the rule-based and statistical perspectives using the software accessible from this web page. Then, write a short (under 4 page) paper which addresses the following questions 1) What are the similarities between rule-based approach and statistical approach? 2) What are the differences between them? 3) What aspects of our language proficiency can be captured by rules? 4) What aspects of our language proficiency can be accounted for by statistics? 5) Which approach is more promising? Support your claims with examples of sentences which are tagged correctly or incorrectly by the two taggers. (A rough guess is that you will need to include at least 5 examples in your writeup, but you will have to make a judgment call here, depending on the points you choose to make.) ALSO: when you claim that a particular tagging is correct and another is incorrect, be sure to justify your claim.

Procedure

Part I. Brill's Rule-Based Tagger

1) The fundamental component of a tagger is a POS (part-of-speech) tagset, or list of all the word categories that will be used in the tagging process. The tagsets used to annotate large corpora in the past have usually been fairly extensive. The pioneering Brown Corpus distinguishes 87 simple tags. Subsequent projects have tended to elaborate the Brown Corpus tagset. For instance, the Lancaster-Oslo/Bergen(LOB) Corpus uses about 135 tags, the Lancaster UCREL group about 165 tags, and the London-Lund Corpus of Spoken English 197 tags. The rationale behind developing such large, richly articulated tagsets is to approach "the ideal of providing distinct codings for all classes of words having distinct grammatical behaviour". Choosing an appropriate set of tags for an annotation system has direct influence on the accuracy and the usefulness of the tagging system. The larger the tag set, the lower the accuracy of the tagging system since the system has to be capable of making finer distinctions. Conversely, the smaller the tag set, the higher the accuracy. However, a very small tag set tends to make the tagging system less useful since it provides less information. So, there is a trade-off here. Another issue in tag-set design is the consistency of the tagging system. Words of the same meaning and same functions should be tagged with the same tags.

Exercise

To have some idea about tag-set designing, you can look at the tags (Penn Treebank Part of Speech Tags) used in the annotation system of the Penn Treebank Project. This is a small tag-set(about 40 tags) compared to the tag-sets mentioned above.

A similar tag set is used by Brill's tagger, which you will have a chance to investigate shortly. There is a paper to justify the appropriateness of the tagset used in the Penn Treebank Project, Building a large annotated corpus of English: the Penn Treebank (Marcus, et al , Computational Linguistics, 19, No. 2, 1993, pp. 313-330).

2) Brill's rule-based tagger begins processing a sentence by looking up each word in a lexicon (lexicon is the linguistic term for dictionary). The lexicon contains a list of the possible parts of speech that the word can be assigned. For example, the word rat can be assigned the tag NN (noun, singular or mass) and also the tag VB (verb, base form) since it can be used as a singular noun (The rat scurried across the floor) or as an infinitive verb (He wouldn't rat on us would he?). When Brill's tagger is assigning its initial tags to words in a sentence, it chooses the part of speech that is most common for each word under standard usage. Thus, rat would be assigned the tag, NN.

Alternatively, Brill's tagger assigns every word the tag NN(singular common noun) initially unless they begin with a capital letter, in which case they are tagged with NNP(singular proper noun).

3) Clearly, the initial tagging performed by Brill9s tagger will contain many errors. To correct these errors, the tagger applies tag-changing rules to the results of the initial stage. There are two kinds of rules, Contextual Rules and Lexical Rules. The Contextual Rules revise the tag of the particular word based on the surrounding words or on the tags of the surrounding words. They take the following forms:

Change tag a to tag b when:

where a, b, z and w are variables over the set of parts of speech, and x and y are variables over all words in the training corpus.

A training corpus is a collection of language data which functions as a sample of the language and is used to establish language models. A training corpus may contain manually-added information to facilitate the training process of the language model. In our case, the training corpus is an annotated (tagged) corpus. Of course, a training corpus can only consist of plain texts without any added annotations. Put it in a simple way, a training corpus functions as examples for the computer system to learn relevant knowledge (e.g. rules) for future usage in automatic natural language processing.

Let's look at two examples of Contextual Rules . The first is

NN VB PREVTAG TO

This means: change tag NN to tag VB when the preceding word is tagged TO For instance,

to/TO run/NN
would be changed to
to/TO run/VB

The second example is

VBP VB PREV1OR2OR3TAG MD

This means: change tag VBP(verb, non-3rd person singular present) to VB(verb, base form) when one of the three preceding words is tagged MD (modal verb).

The following are some of the abbreviations used in the representation of contextual rules:

Exercise:

Examine some of the rules and try to understand them by applying them to examples. Do you think that all contextual information is captured by the above transformation templates? If not, explain what information is missing. Propose some ways to help capture all useful contextual information.

4) The other rules used by Brill's tagger are Lexical Rules. Lexical Rules are used to analyze words which the tagger does not have in its lexicon to see if it can make a reasonable guess as to their classification. The Lexical Rules take advantage of the internal structure or morphology of many words. For example, the word grapes can be analyzed as grape + s. The use of Lexical Rules greatly reduces the size of the lexicon which Brill's parser needs to operate effectively.

The following are examples of the formats of Lexical Rules:

Change the tag of an unknown word (from X) to Y if:

1. Deleting the prefix (suffix) x, |x| =< 4, results in a word (x is any string of length 1 to 4).
2. The first (last) (1,2,3,4) characters of the word are x.
3. Adding the character string x as a prefix (suffix) results in a word (|x| =< 4).
4. Word w ever appears immediately to the left (right) of the word.
5. Character z appears in the word.

Let's look at some examples.

Example 1.

NN s fhassuf 1 NNS
means
change the tag of an unknown word from NN to NNS if it has suffix -s
For instance, webpages/NN to webpages/NNS

Example 2.

NN . fchar CD
means
change the tag of an unknown word from NN to CD if it has character '.'
For instance, 3.5/NN to 3.5/CD

Example 3.

NN - fchar JJ
means
change the tag of an unknown word from NN to JJ if it has character '-'
For instance, man-made, rule-based, three-year-old, etc.

Example 4.

NN ed fhassuf 2 VBN

means

change the tag of an unknown word from NN to VBN if it has suffix -ed
For instance, wolfed/NN to wolfed/VBN

Example 5.

NN ing fhassuf 3 VBG
means
change the tag of an unknown word from NN to VBG if it has suffix -ing
For instance, tagging/NN to tagging/VBG

Example 6.

ly hassuf 2 RB

means

change the tag of an unknown word from ?? (any) to RB if it has suffix -ly
For instance, morphologically/?? to morphologically/RB

Example 7.

ly addsuf 2 JJ

means

change the tag of an unknown word from ?? to JJ if adding suffix -ly results in a word
For instance, sil/NN to sil/JJ

Example 8.

NN $ fgoodright CD

means

change the tag of an unknown word from NN to CD if the word $ can appear to the left

Example 9.

un deletepref 2 JJ

means

change the tag of an unknown word from ?? to JJ if deleting the prefix un- results in a word

Exercise: Using these examples as a guide, examine several more of the lexical rules used by Brill's parser and try to understand them. Find some specific examples on which they have an appropriate effect. Do you think that all lexical information is captured by the above transformation templates? If not, can you find some other ways to help capture all useful lexical information?

5) Try to run Brill's tagger on some English sentences and evaluate the result. The procedure is as follows: Click Brill's Tagger . You will see a window. Type a sentence (or several sentences) into the window and press the "let it parse" button. The tagger will attempt to generate an analysis of the parts of speech in the sentence(s) you specified. You need to insert a space before and after each punctuation mark and end each sentence with a carriage return.

6) Although Brill's tagger is state-of-the-art technology in the domain of tagging, it's performance is nowhere near perfect. The following exercise highlights one of many kinds of difficulties it has:

Exercise:

Based on your intuition, try to identify some systematic errors and explain why they are systematic. Then try to figure out some rules or an algorithm to eliminate them. For example, if you give the tagger the sentence,

The dog bit the boy yesterday .

the tagger's output will be:

The/DT dog/NN bit/NN the/DT boy/NN yesterday/NN ./.

Why is bit tagged as a noun instead of a verb? Note that the tagger initially assigns the tag NN to all words. Then all rules are applied and the tag of bit doesn't change. Maybe we can use the following rule to correct this:

change tag NN to tag VBD when the two preceding words are tagged DT NN and the two following words are tagged DT NN

But is this rule sufficiently general and also consistent with other rules. Can you think of a sentence for which this proposal would worsen the result? Usually, a rule will correct some errors but produce more errors. Can you think of an alternative rule that would do the job better here?

7)
Exercise

Create some unusual yet grammatically acceptable sentences to defeat the tagger and explain what causes the problem. Observe the generative character of the human language faculty. For example, we can get the following two tagged sentences by feeding the tagger with the corresponding untagged sentences. Note that sentences of this kind are called garden-path sentences.

The errors here are the wrong tags of verb raced and floated. They should be VBN(Verb, Past Participle) instead of VBD(Verb, Past Tense). The meanings of these sentences are
However, if we type sentence we get and it is correct. So, what's the problem? Again, this is just an example, you should not be limited in this direction.

Part 2. The Statistical Tagger

Now that you've had a chance to see some of the strengths and weaknesses of Brill's Rule- Based Tagger, you can compare it in your mind to the Statistical Tagger as you proceed through the remaining steps of Assignment 1.

8) Statistical taggers employ large quantities of statistical data, mysteriously combining hosts of numbers to produce an often-reasonable output without seeming to know anything about the rules of language. Let's try to eliminate some of the mystery surrounding them now.

We first consider the problem in its full generality. Let w1, ..., wT be a sequence of words. We want to find the sequence of lexical categories C1, ...CT that maximizes

PROB(C1,...,CT | w1,...,wT) (1)

which means the probability of a sequence of tags C1,...,CT given a sequence of words w1,...,wT. That is to say, given a sequence of words, we try to find the most likely tag sequence it can have.

Unfortunately, it would take far too much data to generate reasonable estimates for such probabilities, so direct methods cannot be applied. There are, however, reasonable approximation techniques that produce good results. To develop them, we must restate the problem using Bayes' rule, which says that this conditional probability equals

PROB(C1, ...CT) * PROB(w1, ...wT | C1, ...CT) ) / PROB(w1, ...wT) (2)

By using some simplifying methods and approximations(you are referred to Allen's book for detailed explanations), we can reduce the problem into finding the sequence C1, ...CT that maximize the value of

producti=1, T i=1, T (PROB(Ci | Ci-1) * PROB(wi | Ci)) (3)

The advantage of this new formula is that the probabilities involved can be readily estimated from a corpus of text labeled with parts of speech. In particular, given a database of text, the bigram probabilities can be estimated simply by counting the number of times each pair of categories occurs and comparing this to the individual category counts.

Bigram means a focus of two words a time, that is, the size of the relevant locality is of two words.

The probability that a V follows an N would be estimated as follows:

PROB(Ci=V|Ci-1=N) approx Count(N at position i-1 and V at i)/Count(N at position i-1) (4)

These probabilities are called bigram probabilities. A bigram probability indicates the likelihood of a category given the knowledge that a particular other category preceded it.

Exercise

Look at the Bigram Probabilities used by the statistical tagger and choose some examples and explain what these probabilities indicate.

The term PROB(wi|Ci) in formula (3) is called the lexical- generation probability. PROB(wi|Ci) can be estimated simply by counting the number of occurrences of each word by category. Note that the lexical-generation probability is the probability that a given category is realized by a specific word, not the probability that a given word falls in a specific category.

Exercise

Look at the Lexical-generation Probabilities used by the statistical tagger and choose some examples and explain what these probabilities indicate.

It is important to note that the accuracy of the Statistical Tagger implemented here is lower than Brill's tagger because the corpus used to train this tagger is small so it only permits a very rough estimate of the relevant statistics. Moreover, the statistical tagger doesn't have a complicated mechanism to deal with unknown words. Therefore, when experimenting with the statistical tagger presented here, it's important to use common words. It is generally agreed that if the two techniques are compared on an equal footing, then they have essentially the same accuracy.

9) Run the statistical tagger on some sentences and evaluate the result. The procedure is as follows:

Click Statistical Tagger. You will see two windows and three buttons between them. Click the "Pre-Process" button first to initialize the tagger. In this step the tagger will read in the probabilities it will use. Then type a sentence into the window and press the "Parse Sentence" button. The tagger will attempt to generate an analysis of the parts of speech in the sentence you specified. The "Clear Input" button is used to refresh the input window. Please insert a space before and after each punctuation mark.

10)
Exercise

Based on your intuition, try to identify some systematic errors and explain why they are systematic. Then try to think how to eliminate them.

For example, the tagger will give you

The/DT dog/NN bit/NN the/DT boy/NN yesterday/NN ./.

if you give the tagger the sentence
The dog bit the boy yesterday .

Why is bit still tagged as a noun instead of a verb? This is just one example and should not be taken as a fixed direction for your thinking.

11) Create some unusual yet grammatically acceptable sentences to defeat the tagger and explain what causes the problem. Observe the generative character of the human language faculty. For example, we can get the following two tagged sentences by feeding the tagger (either Brill's Tagger or the Statistical Tagger) with the corresponding untagged sentences.

The meanings of these sentences are The errors here are still the wrong tags of verb raced and floated. They should be VBN(Verb, Past Participle) instead of VBD(Verb, Past Tense).

However, if we type the sentence

we get and the tag of seen is correct. So, what's the problem? And also, down here is tagged as IN(preposition) while it is tagged by Brill's tagger as RP(particle). Which one do you think is correct? And also, why do both taggers behave the same on the above sentences?

12) A word can belong to more than one part of speech category - the polycategoricity problem - and have more than one meaning - the polysemy problem.

Exercise

How do humans determine the meaning of a word in actual context? Make an explicit hypothesis about how humans disambiguate words in real language use and compare your hypothesis to the way the two kinds of taggers operate. Do you think people do something more like the statistical tagger or more like the rule-based tagger, or do they do something altogether different? Explain your answer.

Background Knowledge

The following factors have influence on the performance of these systems.
1) The size of the tag set used to tag words, that is, the number of word categories the tagger uses to tag words. The larger the tag sets, the lower the accuracy of the results since the tagger has many more subtle distinctions to make and therefore will be more likely to make mistaggings.
2) The corpus used to learn rules and to compute statistics. Performance also depends on the corpus we used to train the tagger. If the corpus is a good sample of the language, tagger performance is enhanced; otherwise, it is degraded. What is usually done is to use some domain-specific corpus to train the tagger and use the tagger within that domain.
3) The lexicon used. The quality and the size of the lexicon also affect the performance of the tagger.
4) The amount of context information and world knowledge used by the tagger to tag words also influences the performance of the tagger.

Supplementary Reading

Eric Brill. Unsupervised Learning of Disambiguation Rules for Part of Speech Tagging. To appear in Natural language Processing Using Very Large Corpora. Kluwer Academic Press.
1997

Note: To view the Brill paper on-line, you need to have the (free) Adobe Acrobat Reader installed. This can be downloaded from Adobe.

II. SENTENCE PARSING AND PREPOSITIONAL PHRASE ATTACHMENT

Another task of NLP -- sentence parsing -- is to figure out the syntactic structure of a sentence. One of the thorny subparts of this task is prepositional phrase attachment: how to identify the modifiee of a prepositional phrase when there is more than one grammatically acceptable candidate. Consider the sentence, "I saw the man on the hill with the telescope. Who was on the hill, the man or the speaker? Who had the telescope, the man or the speaker? The words with the telescope form a prepositional phrase or PP in these sentences. The indeterminacy illustrated here about the role of the PP is called a PP attachment ambiguity. The most naive algorithm for handling PP attachment ambiguities is to choose the nearest noun phrase (NP) as the modifiee. Other advanced algorithms (Brill 1994) use discourse information, human preferences, world knowledge and other helpful information to improve the performance of such systems.

Purpose

Through observing the performance of a simple parsing system, you will gain some understanding about the roles discourse information, language structure, human preferences and world knowledge play in our use of language.

Reading Assignment 2

James Allen. 1995. Natural Language Understanding. The Benjamin/Cummings Publishing Company, 2nd ed. Chapter 3, pp. 41 - 69

James Allen. 1995. Natural Language Understanding. The Benjamin/Cummings Publishing Company, 2nd ed. Chapter 6, pp. 159 - 163

Lab Assignment 2
After you have completed all the exercises and answered all the questions below, you will be ready to prepare a final written report on what you have done and what you have thought about (see Step 8 below for detailed instructions on the final report). Enough examples should be included in your report to support your arguments or discussions.

Procedure

1) To examine how the syntactic structure of a sentence can be computed, you must consider two things: the grammar, which is a formal specification of the structures allowable in the language, and the parsing technique, which is the method of analyzing a sentence to determine its structure according to the grammar. The grammar rules used by the following syntactic parser are a set of simple rewrite rules. These rules say that a certain symbol may be expanded into a sequence of symbols. Note that all the grammar rules we use have a single symbol on their left-hand sides. Grammar rules of this form are called context-free grammar rules. Context-free grammars (CFGs) are a very important class of grammars because the formalism is powerful enough to describe most of the structure in natural languages, and yet is restricted enough so that efficient parsers can be built to analyze sentences.

Symbols that cannot be further decomposed in a context free grammar are called terminal symbols. The other symbols are called nonterminal symbols. A context free grammar has a special symbol called the start symbol.

In our grammar, the start symbol is S (means sentence). A context free grammar produces a sentence in the following way: the start symbol, S, is replaced with a string of symbols according to some rule in the grammar; then each of the nonterminals among the resulting string is replaced by a new string of symbols according to another (possibly different) rule in the grammar; this process is repeated until there are only nonterminals remaining. Two important parts of automated natural language processing are based on sentence derivations of the sort just described. The first is sentence generation, which uses derivations to construct legal sentences. The second is sentence parsing---the process of generating a structural analysis for a sentence.

Let's look at some of the grammar rules you will be using to run the parser. Rule 1 (S --> NP VP) says that an S may consist of an NP(noun phrase) followed by a VP(verb phrase). Rule 2 (VP --> V NP) says that a VP may consist of a V followed by an NP.

Exercise:

Look at some of the grammar rules and think about their meanings.

2) The most common way of describing the structure of sentences or representing how a sentence is broken into its major subparts , and how those subparts are broken up in turn, is as a parse tree. The following are two examples.



Look at the two parse trees above and see how sentence structures are represented. Pay attention to the difference between the two structures of the same sentence. The first one refers to the interpretation that the boy is on the hill whereas the second one refers to the interpretation that the dog is on the hill. There is also a bracketing representation of each structure. Obviously, the tree representations are clearer.

Exercise:

Choose some sentences and try to draw the corresponding trees for them following some grammar rules given or written by yourself. Get familiar with these representations and you will see more in the following.

3) Click on the Syntactic Parser. You will see two windows and a bunch of buttons in between. You can operate the syntactic parser as follows:

NOTE (if you are using Netscape to browse this manual): Netscape may initially fail to create operable scroll bars for the two windows of the Statistical Tagger. If this problem occurs, make sure your Netscape window is big enough to display both Statistical Tagger windows in their entirety and then press Netscape's "Reload" button. This should clear up the problem. If it does not, try pressing, in sequence, the "Read Rules" button and the "Show Rules" button that are in between the two Statistical Tagger windows.

First, you have to input a set of grammar rules. There are some simple default grammar rules given in the upper window (the input window) by the parser. In order to deal with more sentences, you have to add more rules. So, you need to save the result of your grammar rules to a file and reload it next time you use the parser. You can edit grammar rules in the input window directly. Then use the cut (or copy) and paste function of Netscape to save them to a file. Upon finishing the editing of grammar rules, press the "Read Rules" button for the parser to internalize these rules for the following parsing work.

Next, you can type or paste a tagged sentence into the input window. Note that the sentence should be correctly tagged, otherwise you get nothing. Another reason for the failure to get correct result is due to the insufficiency of the grammar. Maybe you have to modify your grammar or add more grammar rules. Then press the "Parse Sentence" button and you will see the result in the lower output window. Note that only bracketing representations are produced in the output window. If you want to see the tree representation of the result, press the "Show Trees" button and then a window will be popped out and you will see the tree representations. Then you drag the lower right corner of the window to enlarge the window and you will see three buttons in the bottom of window. The "Next" and "Previous" buttons are used when there are too more trees to be filled within a screen. The "Close" button is to close the window.

In addition, there are two more buttons. The "Show Rules" button is used to view the grammar rules used. You can press it to see the rules any time you want. Remember that changes to grammar rules take effect only after you press the "Read Rules" button. The "Clear Input" button is to get a fresh input window.

Let's look at a concrete example in the following.

(1) Click Brill's Tagger and type the sentence The dog runs into the text window and then press the let it parse button. You will see the result The/DT dog/NN runs/VBZ. Copy The/DT dog/NN runs/VBZ to the clipboard.

(2) Click Syntactic Parser and press the "Read Rules" button. Upon seeing the message "finished reading rules" in the lower window, paste The/DT dog/NN runs/VBZ into upper window and press the "Parse Sentence" button and you will see the result in bracket format. To see the tree representation, press the "Show Trees" button and you will get a window popped out showing the corresponding trees. Note that you may not be able to see three buttons at the bottom of this window on some machines. In order to see and use them, please use the mouse to enlarge the window and you will see them. They are helpful when you have more trees to view.

(3) Now repeat step (1) with the sentence The dog ate the cat and copy the result The/DT dog/NN ate/VBD the/DT cat/NN into the clipboard.

(4) Repeat step (2) with the result of step (3) and you will see the following message in the lower output window: "No result produced due to the insufficiency of the grammar or some tagging errors in the input sentence".

(5) Press the "Show Rules" button and examine the rules used by the parser. We will find that the problem is that we don't have a rule to deal with verb phrases of the pattern VBD np. So we type a new grammar rules vp -> VBD np into the upper window and then press the "Read Rules" button and paste The/DT dog/NN ate/VBD the/DT cat/NN into the upper window again. Now press the "Parse Sentence" button and we will see the result.

Exercise:

Run the parser on some sentences and see how it works.

4) Run the Syntactic Parser on some sentences with PP attachment ambiguities and analyze the results. Try to figure out how to resolve the ambiguity. For instance, how many interpretations does the sentence "John saw Mary on the hill with a telescope" have? How many acceptable syntactic analysis does it have? In what context is each interpretation and each syntactic analysis preferred? How about sentences like "John saw Mary with a red scarf" ? Try to think more cases similar to and beyond the above examples.

5) Find some sentences that would pose serious problems to designing grammar rules and explain why. For example, how do you deal with relative clauses and wh-questions as well as yes/no questions? Also, how do you design the grammar to deal with sentences like "I like the man that told us an interesting story yesterday" ? "The raft floated down the river sank." and "John's mother's brother is John's uncle." Try to think of more cases similar to and beyond the above examples.

6) Exercise:

Take into account all problems with the sentence parsing problem. What do you think could be done to improve the performance of the system?

7) Does the parsing system process the sentence in the same way as we do?

Exercise: State a hypothesis about how human beings process sentences. Point out the differences between your hypothesis and the hypothesis embodied in the parser described here. Propose an experiment that would help us discover if one hypothesis can be rejected in favor of the other.

8) Final Exercises and Questions:

Choose a construction type from English that you find interesting. Several suggestions with some relevant examples are given below. If you want to investigate something different, it would be a good idea to discuss your idea with one of the instructors in order to decide on a reasonable data set.

1. Based on your intuitions about the possible meanings of the sentences, draw the tree structures that you think are correct.

2. Try using the parser to parse the sentences. It will fail if you only use the small set of grammar rules that were initially provided. Therefore you will need to add some rules. In many cases, the new rules will use additional tags so you will need to refer to the Penn Treebank Tagset to find out which tags to use or try one of the taggers on your sentences to find out what tags are used. Alternatively, you can define your own tags and type in the tagged sentences yourself.

3. When your augmented parser is working correctly (or as close to correctly as you can get it), try out some new constructions that are similar to the ones you've been working with to test it. We have focused so far on the problem of making sure that a parser can parse each grammatical sentence in some set of related constructions. Another way of testing a parser is to see if it fails to find a parse when given a sentence that is ungrammatical. Allen (1995) notes, for example, that ELIZA fails to recognize the ungrammaticality of a sentence like "Green the adzabak are the a ran four". Try testing your parser on some ungrammatical sentences closely related to the grammatical sentences you are focusing on. Does it fail to find a parse on examples that you believe are unparseable?

Final assignment:

Write up a report on the outcome of your investigations. Describe and justify your assumptions about what the correct parses are. Describe the behavior of your implemented parser. Be sure to include a number of example sentences and their parses in order to justify your claims. (On the order of 8 examples may be necessary for this assignment, but again, you will need to use your judgment here.) How closely does the parser's behavior match the behavior you think is correct? Given your experience with automatic parsing in this module, address the following questions:

How similar is the NLP system you have seen here to what human beings do in natural language use? How similar should it be? Note that since we do not know what algorithms people use, we cannot directly compare their computations to those of the models here. But a good way of comparing is to ask if the model fails in the same way human beings do. Can you think of alternative models for the computation of meanings from speech input?

WHAT TO TURN IN

. After you have completed all the exercises and answered all the questions above, you will be ready to prepare a final written report. Please hand in a paper version of this report. Enough examples should be included to support your discussions and arguments.

Possible topics:

---Coordination

E.g. Joanna likes apples and oranges.
E.g. Joanna likes fresh apples and oranges.
E.g. Joanna likes apples and buys oranges.
E.g. Joanna likes apples and oranges with chocolate.
E.g. The apples were on and under the table.
E.g. Quietly and furtively, Joanna consumed the cupcakes.
E.g. (tricky) Joanna is a republican and proud of it.

---Auxiliary verbs

E.g. John likes the book.
E.g. John may like the book.
E.g. John may have liked the book.
E.g. John has liked the book.
E.g. John may have been reading the book.
E.g. John may have been surprised by the book.

---Possessives (this one is a little tricky)

E.g. The cook 's cat ran away.
E.g. The queen of England 's cat ran away.
E.g. The son of Laura 's daughter is the daughter of Laura 's son.
E.g. Annabel's brother 's friend 's sister is coming to visit.
(suggestion treat the "'s " as a separate word )

---Clausal embedding

E.g. The boy that eats hamburger knows John.
E.g. We met the boy who knows John.
E.g. The boy that we know likes John.
E.g. The boy we know likes John.
E.g. The dog that the cat that the rat bit chased barked.

III OTHER BACKGROUND KNOWLEDGE

Enlightening Articles for Additional Reading

Marvin Minsky. Logical vs. Analogical or Symbolic vs. Connectionist or Neat vs. Scruffy

Marvin Minsky. Why People Think Computers Can't

Marvin Minsky. The Society of Mind. Location: Olin Library Oversize; Call number: +BF431.M69

Current Research into NLP and General "Machine Understanding"

The Machine Understanding Group at M.I.T.

Other Tasks of NLP
1) Term and name identification
2) Word sense disambiguation
3) Morphological analysis
4) Unknown word processing
5) Spelling and grammar checking
6) Conceptual analysis of sentences
7) Semantic and pragmatic analysis
8) Natural language generation
9) Speech recognition and speech synthesis
10) Machine translation
11) Information retrieval


NLP Module of COGST 201/PSYCH 201/COM S 201

mao@cs.cornell.edu
This page maintained by Doug Elrod. Last updated 8/26/98.