Parsing WordNet's glosses
In the first version of the eXtended WordNet release, XWN 1.7, the glosses of WordNet 1.7 are parsed, transformed into the logic forms and the senses of the words are disambiguated.
The glosses were first preprocessed; examples and the contents between parentheses were ignored.
For example, the second sense of noun "blind" has the following entry in WordNet 1.7:
a hiding place sometimes used by hunters (especially duck hunters); "he waited impatiently in the blind"
In XWN 1.7 only the definition was processed:
a hiding place sometimes used by hunters
To improve the accuracy of the parsing, the glosses have been extended in the following way:
The glosses have been tagged with an improved version of Brill's tagger, trained on WordNet.
- the adverb glosses were extended with the adverb and "is" in front of the gloss and a period at the end of it, as in:
entirely is without any others being included or involved .
- the adjective glosses were extended with the adjective and "is something" in front of the gloss and a period at the end of it, as in:
infinite is something total and all-embracing .
- the verb glosses were extended with the "to" + verb + "is to" in front of the gloss and a period at the end of it, as in:
to hiccup is to breathe spasmodically , and make a sound .
- the noun glosses were extended with the noun and "is" in front of the gloss and a period at the end of it, as in:
space is the unlimited 3-dimensional expanse in which everything is located .
A voting scheme between two parsers has been used in order to parse the glosses with high accuracy. The two parsers are: Charniak's parser and an in-house parser (a Collins' type of parser).
The parsed glosses provided in our first release of XWN fall into three categories: GOLD, SILVER and NORMAL.
GOLD quality is attributed to those parsed glosses that have been manually checked.
SILVER quality is attributed to those parsed glosses for which there has been agreement between the two parsers, but without human verification.
NORMAL quality is attributed to the rest of the glosses, meaning that there has been no agreement between the two parsers and no human intervention has been employed to check them. The in-house parser was given precedence in this situation.
||# TOTAL GLOSSES
||# GOLD GLOSSES
||# SILVER GLOSSES
||# NORMAL GLOSSES