Search code examples
pythonpandasgroup-byisin

Add new column with number of occurences (.isin) in other column


How can I count how often each value in the column substring occurs in the column string and append the result as a new column given this example dataframe:

df = pd.DataFrame({
'substring':['a', 'b', 'c', 'a', 'b', 'c', 'd', 'd', 'a']
'string':['a a b', '', 'a a', 'b', 'a b', 'a c a', '', 'b c a', 'd d']})

  substring string
0         a  a a b
1         b      
2         c    a a
3         a      b
4         b    a b
5         c  a c a
6         d       
7         d  b c a
8         a    d d

and here what I'd like the output to look like:

  substring string count
0         a  a a b    5
1         b           4
2         c    a a    2
3         a      b    5
4         b    a b    4
5         c  a c a    2
6         d           1
7         d  b c a    1
8         a    d d    5

Solution

  • Your question is not really explicit, but I guess you want to count the number of times the characters (or words?) appear in the string overall, without counting duplicates per string.

    You could use a conversion to set and collections.Counter:

    from itertools import chain
    from collections import Counter
    
    # count unique elements (here words)
    c = Counter(chain.from_iterable(set(x.split()) for x in df['string']))
    
    ## alternative for characters
    # c = Counter(chain.from_iterable(set(x) for x in df['string']))
    
    # map counts
    df['count'] = df['substring'].map(c)
    

    output:

      substring string  count
    0         a  a a b      5
    1         b             4
    2         c    a a      2
    3         a      b      5
    4         b    a b      4
    5         c  a c a      2
    6         d             1
    7         d  b c a      1
    8         a    d d      5
    
    pure pandas variant for the counter (quite slower)
    c = df['string'].str.split().apply(set).explode().value_counts()