Part of Speech Tags 3

A frequently asked question is “What do the Part of Speech tags (VB, JJ, etc) mean?” The bottom line is that these tags mean whatever they meant in your original training data. You are free to invent your own tags in your training data, as long as you are consistent in their usage.

Training data generally takes a lot of work to create, so a pre-existing corpus is typically used. These usually use the Penn Treebank or Brown Corpus tags.

The most common part of speech (POS) tag schemes are those developed for the Penn Treebank and Brown Corpus. Penn Treebank is probably the most common, but both corpora are available with NLTK.

Penn Treebank POS Tags

Here are the POS tags used in the Penn Treebank:

POS Tag	Description	Example
CC	coordinating conjunction	and
CD	cardinal number	1, third
DT	determiner	the
EX	existential there	there is
FW	foreign word	d’hoevre
IN	preposition/subordinating conjunction	in, of, like
JJ	adjective	big
JJR	adjective, comparative	bigger
JJS	adjective, superlative	biggest
LS	list marker	1)
MD	modal	could, will
NN	noun, singular or mass	door
NNS	noun plural	doors
NNP	proper noun, singular	John
NNPS	proper noun, plural	Vikings
PDT	predeterminer	both the boys
POS	possessive ending	friend‘s
PRP	personal pronoun	I, he, it
PRP$	possessive pronoun	my, his
RB	adverb	however, usually, naturally, here, good
RBR	adverb, comparative	better
RBS	adverb, superlative	best
RP	particle	give up
TO	to	to go, to him
UH	interjection	uhhuhhuhh
VB	verb, base form	take
VBD	verb, past tense	took
VBG	verb, gerund/present participle	taking
VBN	verb, past participle	taken
VBP	verb, sing. present, non-3d	take
VBZ	verb, 3rd person sing. present	takes
WDT	wh-determiner	which
WP	wh-pronoun	who, what
WP$	possessive wh-pronoun	whose
WRB	wh-abverb	where, when

The official annotation guidelines including full descriptions can be found here (GZip-compressed Postscript file). This includes confusing parts of speech, capitalization, and other conventions.

Brown Corpus POS Tags

The Brown Corpus POS tags are very similar, and there is the potential for some confusion. However, there are differences. For example, the Penn Treebank has three types of adjective (JJ, JJR, JJS) but the Brown Corpus divides JJS into JJS and JJT.

The Brown Corpus also has rules for combining tags. For example, the colloquial “wanna” means “want to” and is tagged “VB+TO” (“want/VB to/TO”). Similarly, a suffix asterisk indicates a negative, so that “aren’t” becomes “BER*”.

The Brown Corpus manual is available here,and useful summaries can be found at the University of Leeds and at Wikipedia.

3 thoughts on “Part of Speech Tags”

Subhan Ullah ledger live Nov 2,2013 4:52 am

How i can use the NLP for the POS in PHP
- Richard Marsden ledger live Nov 2,2013 7:58 am
  
  You would need to find an NLP library for PHP. PHP isn’t really designed for things like computer learning and NLP type processing – I would use a different library, and call that from PHP. E.g. NLTK in Python or Stanford NLP for Java.
grokorg ledger live Feb 22,2016 6:22 pm

PHP is a POS when it comes to NLP.

I would advise you use Python to do the processing and possibly ZeroMQ to actually connect the two if you must. But that’s just how I like to roll things.

Comments are closed.

Winwaed Blog

Part of Speech Tags 3

Penn Treebank POS Tags

Brown Corpus POS Tags

Related Posts

3 thoughts on “Part of Speech Tags”