I'm trying to filter a list of tuples which second element start with 'V' in order to clean my dataframe.
i have a pandas dataframe call 'df_my_string' like :
a sample is:
verbs_tokens
[('[', 'NNS'), ("'Europe", "''"), ('was', 'VBD'), ('always', 'RB'), ('the', 'DT'), ('future', 'NN'), ('.', '.'), ("'", "''"), (']', 'NN')]
[('[', 'IN'), ("'Europe", 'CD'), ('marks', 'NNS'), ('its', 'PRP$'), ('anniversary', 'NN'), (',', ','), ('it', 'PRP'), ('is', 'VBZ')]
What I need is keep the tuples for each row that the second value start with "V"
I have tried many ways but I can not figure out how:
#df_my_string['clean_verbs_tokens']=filter((lambda x: x[1].startswith('V')),df_my_string[['verbs_tokens']])
#df_my_string['clean_verbs_tokens'] = df_my_string.verbs_tokens.apply(lambda x: str(x[0][1]).startswith('V'))
#df_my_string['clean_verbs_tokens'] = df_my_string.verbs_tokens.apply(lambda x: str(x[0][1]).startswith('V'))
#df_my_string['clean_verbs_tokens'] = [tup for tup in df_my_string['verbs_tokens'] if str(tup[0][1])=='V']
#df_my_string['clean_verbs_tokens'] = [item for item in df_my_string['verbs_tokens'] if pd.Series(re.search('^V.*',item[0][1])).reset_index(drop=True).values]
The expected output:
verbs_tokens
[('was', 'VBD')]
[('is', 'VBZ')]
Try:
df_my_string['clean_verbs_tokens'] = df_my_string["verbs_tokens"].apply(lambda x: [t for t in x if t[1].lower().startswith("v")])
>>> df_my_string['clean_verbs_tokens']
0 [(was, VBD)]
1 [(is, VBZ)]
Name: clean_verbs_tokens, dtype: object