Search code examples
pandascountfrequencycalculated-columns

How to create and populate pandas columns based on cell values


I have created a dataframe called df as follows:

import pandas as pd
d = {'feature1': [1, 22,45,78,78], 'feature2': [33, 2,2,65,65], 'feature3': [100, 2,359,87,2],}
df = pd.DataFrame(data=d)
print(df)

The dataframe looks like this:

enter image description here

I want to create two new columns called Freq_1 and Freq_2 that count, for each record, how many times the number 1 and number 2 appear respectively. So, I'd like the resulting dataframe to look like this:

enter image description here

So, let's take a look at the column called Freq_1:

  • for the first record, it's equal to 1 because the number 1 appears only once across the whole first record;
  • for the other records, it's equal to 0 because the number 1 never appears.

Let's take a look now at the column called Freq_2:

  • for the first record, Freq_2 is equal to 0 because number 2 doesn't appear;
  • for second record, Freq_2 is equal to 2 because the number 2 appears twice;
  • and so on ...

How do I create the columns Freq_1 and Freq_2 in pandas?


Solution

  • Try this:

    freq = {
        i: df.eq(i).sum(axis=1) for i in range(10)
    }
    
    pd.concat([df, pd.DataFrame(freq).add_prefix("Freq_")], axis=1)
    

    Result:

     feature1  feature2  feature3  Freq_0  Freq_1  Freq_2  Freq_3  Freq_4  Freq_5  Freq_6  Freq_7  Freq_8  Freq_9
            1        33       100       0       1       0       0       0       0       0       0       0       0
           22         2         2       0       0       2       0       0       0       0       0       0       0
           45         2       359       0       0       1       0       0       0       0       0       0       0
           78        65        87       0       0       0       0       0       0       0       0       0       0
           78        65         2       0       0       1       0       0       0       0       0       0       0