wazirx

PancakeSwap

trezor.io/start

Toobit Exchange

https://toobit-exchange.com

Trezor Suite

Trezor Live

trezor.io/start

wazirx exchange

https://wazirxexchange.com

ledger live

https://ledger-live-app.com

Toobit

Orion Stars

https://orionstars.asia

Trezor Bridge

https://trezorbridge.org

trezor.io/start

Trezor Wallet

https://trezorwallet.online

trezor.io/start

Extracting Noun Phrases from Parsed Trees – Winwaed Blog
Site icon Winwaed Blog

Extracting Noun Phrases from Parsed Trees

Following on from my previous post about NLTK Trees, here is a short Python function to extract phrases from an NLTK Tree structure.

Recently I needed to extract noun phrases from a section of text. This was in an attempt to choose “interesting concept phrases”. N-gram collocations are a common way of performing this, but these also resulted in partial phrases that poorly defined a concept. Most of the phrases I was interested in were noun phrases, so I chose to tag and chunk the text. The noun phrases (tagged ‘NP’) were then extracted from the chunked tree structures.

Here is the code:

from nltk.tree import *

# Tree manipulation

# Extract phrases from a parsed (chunked) tree
# Phrase = tag for the string phrase (sub-tree) to extract
# Returns: List of deep copies;  Recursive
def ExtractPhrases( myTree, phrase):
    myPhrases = []
    if (myTree.node == phrase):
        myPhrases.append( myTree.copy(True) )
    for child in myTree:
        if (type(child) is Tree):
            list_of_phrases = ExtractPhrases(child, phrase)
            if (len(list_of_phrases) > 0):
                myPhrases.extend(list_of_phrases)
    return myPhrases

This function iterates through the tree, finding all sub-trees with matching tags (‘NP’ in my application), and returning a list of deep copies of these sub-trees.

Here is an example of the function’s usage:

test = Tree.parse('(S (NP I) (VP (V enjoyed) (NP my cookies)))')
print "Input tree: ", test

print "\nNoun phrases:"
list_of_noun_phrases = ExtractPhrases(test, 'NP')
for phrase in list_of_noun_phrases:
    print " ", phrase

This function is a simple demonstration of how the Tree structure can be easily processed using short functions.

It is written in a very procedural manner and is neither very functional nor Pythonic. Perhaps you could write a more elegant and Pythonic version?

 

 

Exit mobile version