my question is a bit tricky here, in fact i'm trying to identify the ROLE of a word in a given sentence, i manage to get something using nltk, the problem is that it's telling me what the word is, what i'm searching for is it's job. For example God Loves Apples would not return God as a subject in this given sentence. in fact here it would return God as a NNP, which is not what i'm looking for. So im looking for getting as the dict key the role of the given word in it's string (looking for god as subject not god as NNP)
import sys # Imports
import subprocess # Imports
subprocess.check_call([sys.executable, '-m', 'pip', 'install',
'nltk','--quiet'],stderr=subprocess.DEVNULL) # Downloading NLTK
import nltk # Imports
n=input("Enter something\n") # User input
tokens = nltk.word_tokenize(n) # Formatting
tagged = nltk.pos_tag(tokens) # Formatting
dct = dict((y,x) for x, y in tagged) #tuple to dict
file = open('DATA.txt', 'a') # Creating new txt to store data
sys.stdout = file # Getting out of it
print(dct.get('NNP'),' :') #Getting and printing NNP if exists else print the sentence
print(dct) # Printing data
print("\n") #next line
file.close() # Closing it
You could use dependency parsing. NLTK is not ideal for this task, but there are alternatives like CoreNLP or SpaCy. Both can be tested online (here and here). The dependency tree will tell you that in God loves apples.
, the token God
is connected to the main verb with the nsubj
relation, i.e., nominal subject.
I usually go for SpaCy:
import spacy
nlp = spacy.load('en_core_web_sm')
# Process the document
doc = nlp('God loves apples.')
for tok in doc:
print(tok, tok.dep_, sep='\t')
which results in
God nsubj
loves ROOT
apples dobj
. punct