Search code examples
python-3.xpandascounting

Count occurrences of a string from one column in another


I have a df that looks like this:

  Policy Letter             Password Lower Upper
0    4-5      l               rllllj     4     5
1   4-10      s  ssskssphrlpscsxrfsr     4    10
2  14-18      p  ppppppppppppppppppp    14    18
3    1-6      z       zzlzvmqbzzclrz     1     6
4    4-5      j          jhjjhxhjkxj     4     5

I want to count how many times the letter in the 'Letter' column appears in the password from the 'Password' column on for each row.

In other words, how many l's are in the password for the first row (4). How many s's in the password for the second row (8).

And so on.

If I do this:

df['Count'] = df['Password'].str.count('s')

It runs correctly but it only counts s's in every password in the column.

When I try this:

df['Count'] = df['Password'].str.count(df['Letter'])

it throws an error:

TypeError: 'Series' objects are mutable, thus they cannot be hashed

I do not know how (if possible) to get str.count() to check a different value for each row.


Solution

  • You can apply a custom function on each row (like a loop):

    df['Count'] = df.apply(lambda x: x['Password'].count(x['Letter']), axis=1)
    print(df)
    
    # Output
      Policy Letter             Password  Lower  Upper  Count
    0    4-5      l               rllllj      4      5      4
    1   4-10      s  ssskssphrlpscsxrfsr      4     10      8
    2  14-18      p  ppppppppppppppppppp     14     18     19
    3    1-6      z       zzlzvmqbzzclrz      1      6      6
    4    4-5      j          jhjjhxhjkxj      4      5      5
    

    With a comprehension:

    df['Count'] = [x.Password.count(x.Letter) for x in df.itertuples()]
    # df['Count'] = [x[3].count(x[2]) for x in df.itertuples()]