Search code examples
pythonpandaslist-comprehension

List Comprehension Using any() Creating Multiple Entries in Pandas


I have a scenario where I have created a list of keywords, and I'm iterative over the rows of a dataframe to determine a column value if another column contains any words from my keyword list in it. Here is an example:

kwrds = ['dog', 'puppy', 'golden retriever']

df = pd.DataFrame({
'description': ['This is a puppy', 'This is a dog', 'This is a golden retriever type dog', 'This is a cat', 'this is a kitten'],
'name': ['Rufus', 'Dingo', 'Rascal', 'MewMew', 'Jingles'],
'species': []})

for i,r in df.iterrows():
    if any([x in r['description'] for x in kwrds]):
          df.at[i, 'species'] = 'Canine'
    else:
          df.at[i, 'species'] = 'Feline'

The looping itself seems to work fine, however I am running into an issue where sometimes the species column will end up with multiple entries like

CanineCanineCanineCanine

Where other times it will work fine.

From what I understand the list comprehension itself should only return a true or false value. It almost seems like the row is getting iterated over multiple times, but with the same index, so the entry is created over and over.

The problem I'm thinking with that thought though is that it is not happening for every row in the dataframe. Only some, and generally always towards the end of the dataframe.

I'm not even sure where to start on trying to diagnose this issue.


Solution

  • df['Species'] = np.where(df.description.str.contains("|".join(kwrds)), 'Canine', 'Feline')
    
    df
                               description     name Species
    0                      This is a puppy    Rufus  Canine
    1                        This is a dog    Dingo  Canine
    2  This is a golden retriever type dog   Rascal  Canine
    3                        This is a cat   MewMew  Feline
    4                     this is a kitten  Jingles  Feline