I believe to have a simple problem. I have a pandas dataframe df
looking quite similar to this:
data = [{"Text" : "Dog", "Dog" : 1},
{"Text" : "Cat", "Dog" : 0},
{"Text" : "Mouse", "Dog" : 0},
{"Text" : "Dog", "Dog" : 1}]
df = pd.DataFrame(data)
I am trying to search the column Text
for a number of keywords and count how many times they appear in each cell. The result is supposed to be stored in a new column that shows how many times the specific keyword was found. The result is supposed to be just like the Dog
column.
I tried using pandas
str.count
. It works just fine. But in the moment I try to store the result in a new column, I run in to trouble:
mykewords = ('Cat', 'Mouse')
df['Cat'] = df.Text.str.count("Cat")
I get the following error message:
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead
See the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
if __name__ == '__main__':
I have two questions:
mykeywords
and get a column each?Thank you very much for any help in advance!
If possible multiple values in text and need count values:
mykewords = ('Cat', 'Mouse')
for x in mykewords:
df[x] = df.Text.str.count(x)
Better solution is use words boundaries with Series.str.findall
and Series.str.len
:
for x in mykewords:
df[x] = df.Text.str.findall(r"\b{}\b".format(x)).str.len()
Difference in solutions:
data = [{"Text" : "Dog Cat Catman", "Dog" : 1},
{"Text" : "Cat Cat", "Dog" : 0},
{"Text" : "Mouse Cat", "Dog" : 0},
{"Text" : "Dog", "Dog" : 1}]
df = pd.DataFrame(data)
df1 = df.copy()
print (df)
Dog Text
0 1 Dog Cat Catman
1 0 Cat Cat
2 0 Mouse Cat
3 1 Dog
mykewords = ('Cat', 'Mouse')
for x in mykewords:
df[x] = df.Text.str.findall(r"\b{}\b".format(x)).str.len()
print (df)
Dog Text Cat Mouse
0 1 Dog Cat Catman 1 0 <-not match Catman
1 0 Cat Cat 2 0
2 0 Mouse Cat 1 1
3 1 Dog 0 0
for x in mykewords:
df1[x] = df1.Text.str.count(x)
print (df1)
Dog Text Cat Mouse
0 1 Dog Cat Catman 2 0 <-match Catman
1 0 Cat Cat 2 0
2 0 Mouse Cat 1 1
3 1 Dog 0 0