Search code examples
pythonpandasdata-analysisexploratory-data-analysis

I want to create a new frequency column for each column in a pandas dataframe


Let's say I have a dataframe like this:

colors animals
yellow cat
yellow cat
red cat
red cat
blue cat

I want to create a column for each column showing the frequency in which each value happens:

colors colors_frequency animals animals_frequency
yellow 40% cat 100%
yellow 40% cat 100%
red 40% cat 100%
red 40% cat 100%
blue 20% cat 100%

I tried

frequency = list()
for column in df.columns:
     series = (df[column].value_counts(normalize=True, dropna=True)*100)
     overview.append(series)

#overview list
o_colors = overview[0] 
o_animals = overview[1]

df['animals_frequency'] = o_animals

If I try

df.info()

it returns

Column Non-Null Count Dtype
animals_frequency 0 non-null float64

Solution

  • A simple approach is to calculate the relative frequency for each column values and then join these frequencies back to the original DataFrame.

    for col in df.columns:
        # Get relative frequency
        frequency = df[col].value_counts(normalize=True)
    
        # Format frequency as percent values, rounded to two decimals
        frequency_percent = (frequency*100).round(2).astype(str) + '%'
    
        # Join frequency values to original DF
        frequency_percent.name = f"{col}_frequency"
        df = df.merge(frequency_percent.to_frame(), left_on=col, right_index=True)