A frequently asked question is “What do the Part of Speech tags (VB, JJ, etc) mean?” The bottom line is that these tags mean whatever they meant in your original training data. You are free to invent your own tags in your training data, as long as you are consistent in their usage.
Training data generally takes a lot of work to create, so a pre-existing corpus is typically used. These usually use the Penn Treebank or Brown Corpus tags.
The most common part of speech (POS) tag schemes are those developed for the Penn Treebank and Brown Corpus. Penn Treebank is probably the most common, but both corpora are available with NLTK.
Penn Treebank POS Tags
Here are the POS tags used in the Penn Treebank:
POS Tag | Description | Example |
---|---|---|
CC | coordinating conjunction | and |
CD | cardinal number | 1, third |
DT | determiner | the |
EX | existential there | there is |
FW | foreign word | d’hoevre |
IN | preposition/subordinating conjunction | in, of, like |
JJ | adjective | big |
JJR | adjective, comparative | bigger |
JJS | adjective, superlative | biggest |
LS | list marker | 1) |
MD | modal | could, will |
NN | noun, singular or mass | door |
NNS | noun plural | doors |
NNP | proper noun, singular | John |
NNPS | proper noun, plural | Vikings |
PDT | predeterminer | both the boys |
POS | possessive ending | friend‘s |
PRP | personal pronoun | I, he, it |
PRP$ | possessive pronoun | my, his |
RB | adverb | however, usually, naturally, here, good |
RBR | adverb, comparative | better |
RBS | adverb, superlative | best |
RP | particle | give up |
TO | to | to go, to him |
UH | interjection | uhhuhhuhh |
VB | verb, base form | take |
VBD | verb, past tense | took |
VBG | verb, gerund/present participle | taking |
VBN | verb, past participle | taken |
VBP | verb, sing. present, non-3d | take |
VBZ | verb, 3rd person sing. present | takes |
WDT | wh-determiner | which |
WP | wh-pronoun | who, what |
WP$ | possessive wh-pronoun | whose |
WRB | wh-abverb | where, when |
The official annotation guidelines including full descriptions can be found here (GZip-compressed Postscript file). This includes confusing parts of speech, capitalization, and other conventions.
Brown Corpus POS Tags
The Brown Corpus POS tags are very similar, and there is the potential for some confusion. However, there are differences. For example, the Penn Treebank has three types of adjective (JJ, JJR, JJS) but the Brown Corpus divides JJS into JJS and JJT.
The Brown Corpus also has rules for combining tags. For example, the colloquial “wanna” means “want to” and is tagged “VB+TO” (“want/VB to/TO”). Similarly, a suffix asterisk indicates a negative, so that “aren’t” becomes “BER*”.
The Brown Corpus manual is available here,and useful summaries can be found at the University of Leeds and at Wikipedia.
How i can use the NLP for the POS in PHP
You would need to find an NLP library for PHP. PHP isn’t really designed for things like computer learning and NLP type processing – I would use a different library, and call that from PHP. E.g. NLTK in Python or Stanford NLP for Java.
PHP is a POS when it comes to NLP.
I would advise you use Python to do the processing and possibly ZeroMQ to actually connect the two if you must. But that’s just how I like to roll things.