I have a dataset that consists of tokenized, POS-tagged phrases as one column of a dataframe:
I want to create a new column in the dataframe, consisting only of the proper nouns in the previous column:
Right now, I'm trying something like this for a single row:
if 'NNP' in df['Description_POS'][96][0:-1]:
df['Proper Noun'] = df['Description_POS'][96]
But then I don't know how to loop this for each row, and how to obtain the tuple which contains the proper noun. I'm very new right now and at a loss for what to use, so any help would be really appreciated!
Edit: I tried the solution recommended, and it seems to work, but there is an issue.
this was my dataframe: Original dataframe
After implementing the code recommended
df['Proper Nouns'] = df['POS_Description'].apply(
lambda row: [i[0] for i in row if i[1] == 'NNP'])
it looks like this: Dataframe after creating a proper nouns column
You can use the apply method, which as the name suggests will apply the given function to every row of the dataframe or series. This will return a series, which you can add as a new column to your dataframe
df['Proper Nouns'] = df['POS_Description'].apply(
lambda row: [i[0] for i in row if i[1] == 'NNP'])
I am assuming the POS_Description dtype to be a list of tuples.