I have a dataframe like this:
col1 col2
[abc, bcd, dog] [[.4], [.5], [.9]]
[cat, bcd, def] [[.9], [.5], [.4]]
the numbers in the col2
lists describe the element (based on list index location) in col1
. So ".4" in col2
describes "abc" in col1
.
I want to create 2 new columns, one that pulls only the elements in col1
that are >= .9 in col2
, and the other column as the number in col2
; so ".9" for both rows.
Result:
col3 col4
[dog] .9
[cat] .9
I think going a route where removing the nested list from col2
is fine. But that's harder than it sounds. I've been trying for an hour to remove those fing brackets.
Attempts:
spec_chars3 = ["[","]"]
for char in spec_chars3: # didn't work, turned everything to nan
df1['avg_jaro_company_word_scores'] = df1['avg_jaro_company_word_scores'].str.replace(char, '')
df.col2.str.strip('[]') #didn't work b/c the nested list is still in a list, not a string
I haven't even figured out how to pull out the list index number and filter col1 on that
You can use list comprehensions to populate new columns with your criteria.
df['col3'] = [
[value for value, score in zip(c1, c2) if score[0] >= 0.9]
for c1, c2 in zip(df['col1'], df['col2'])
]
df['col4'] = [
[score[0] for score in c2 if score[0] >= 0.9]
for c2 in df['col2']
Output
col1 col2 col3 col4
0 [abc, bcd, dog] [[0.4], [0.5], [0.9]] [dog] [0.9]
1 [cat, bcd, def] [[0.9], [0.5], [0.4]] [cat] [0.9]