I am trying to find correct parts of speech for each word in paragraph. I am using Stanford POS Tagger. However, I am stuck at a point.
I want to identify prepositions from the paragraph.
Penn Treebank Tagset says that:
IN Preposition or subordinating conjunction
how, can I be sure if current word is be preposition or subordinating conjunction. How can I extract only prepositions from paragraph in this case?
I have had some breakthrough to understand if the word is actually preposition or subordinating conjunction.
I have parsed following sentence :
She left early because Mike arrived with his new girlfriend.
(here because is subordinating conjunction )
After POS tagging
She_PRP left_VBD early_RB because_IN Mike_NNP arrived_VBD with_IN his_PRP$ new_JJ girlfriend_NN ._.
here , to make sure because is a preposition or not I have parsed the sentence.
here because has direct parent after IN as SBAR(Subordinate Clause) as root.
with also comes under IN but its direct parent will be PP so it is a preposition.
Example 2 :
Keep your hand on the wound until the nurse asks you to take it off. (here until is coordinating conjunction )
POS tagging is :
Keep_VB your_PRP$ hand_NN on_IN the_DT wound_NN until_IN the_DT nurse_NN asks_VBZ you_PRP to_TO take_VB it_PRP off_RP ._.
So , until and on are marked as IN.
However, picture gets clearer when we actually parse the sentence.
So finally I conclude because is subordinating conjunction and with is preposition.
Tried for many variations of sentences .. worked for almost all except some cases for before and after.