supposed my data set
name what
A apple[red]
B cucumber[green]
C dog
C orange
D banana
D monkey
E cat
F carrot
.
.
I want to create and specify a list, and if the column contains a value contained in that list, I want to make the specified value a new column.
list value
fruit = ['apple', 'banana', 'orange']
animal = ['dog', 'monkey', 'cat']
vegetable = ['cucumber', 'carrot']
result what I want
name what class
A apple fruit
B cucumber vegetable
C dog animal
C orange fruit
D banana fruit
D monkey animal
E cat animal
F carrot vegetable
List values and column values do not 'match' and must be contained.
thank you for reading.
Use Series.map
with dictionary created from lists and swapped keys with values with flattening:
fruit = ['apple', 'banana', 'orange']
animal = ['dog', 'monkey', 'cat']
vegetable = ['cucumber', 'carrot']
d = {'fruit':fruit, 'animal':animal,'vegetable':vegetable}
#http://stackoverflow.com/a/31674731/2901002
d1 = {k: oldk for oldk, oldv in d.items() for k in oldv}
Loop alternative of dictionary comprehension:
d1 = {}
for oldk, oldv in d.items():
for k in oldv:
d1[k] = oldk
And then:
df['class'] = df['what'].map(d1)
#if need values before first [
#df['class'] = df['what'].str.split('[').str[0].map(d1)
print (df)
name what class
0 A apple fruit
1 B cucumber vegetable
2 C dog animal
3 C orange fruit
4 D banana fruit
5 D monkey animal
6 E cat animal
7 F carrot vegetable
EDIT: For match by subtrings you can loop by dictionary d
, check matching by Series.str.contains
for mask and set new values:
d = {'fruit':fruit, 'animal':animal,'vegetable':vegetable}
for k, v in d.items():
mask = df['what'].str.contains('|'.join(v))
df.loc[mask, 'class'] = k
print (df)
name what class
0 A apple[red] fruit
1 B cucumber[green] vegetable
2 C dog animal
3 C orange fruit
4 D banana fruit
5 D monkey animal
6 E cat animal
7 F carrot vegetable
If possible multiple words is possible use words boundaries:
for k, v in d.items():
pat = '|'.join(r"\b{}\b".format(x) for x in v)
df.loc[ df['what'].str.contains(pat), 'class'] = k
print (df)
name what class
0 A apple[red] fruit
1 B cucumber[green] vegetable
2 C dog animal
3 C orange fruit
4 D banana fruit
5 D monkey animal
6 E cat animal
7 F carrot vegetable