How to simulate windows fuction with partition by in pandas?

I got this data with Nulls in original_eur column.

	event_id	category	rounds_bot_date	original_eur
0	1	Category 1	2024-03-25 00:00:00	200
1	1	Category 1	2024-03-25 00:00:00	nan
2	2	Category 2	2024-03-25 00:00:00	nan
3	2	Category 2	2024-03-25 00:00:00	150
4	2	Category 2	2024-03-25 00:00:00	150
5	2	Category 1	2024-03-25 00:00:00	nan
6	3	Category 3	2024-03-25 00:00:00	nan
7	3	Category 2	2024-03-25 00:00:00	150
8	3	Category 3	2024-03-25 00:00:00	60
9	3	Category 2	2024-03-25 00:00:00	150

I need replace each null in columns with median value for appropriate event_id, category, rounds_bot_date.

With SQL I can use case + median window function:

case
    when original_eur = NaN
    then median(original_eur) over(partition by event_id, category, rounds_bot_date)
    else original_eur
end as original_eur

For Pandas I make table with medians:

median_table = (
    dataset
    .groupby(['event_id', 'category', 'rounds_bot_date'])
    .agg(original_eur_median = ('original_eur', 'median'))
    .reset_index()
)

And apply this fuction to dataset:

def fill_na(value, event_id, category, rounds_bot_date, median_table: pd.DataFrame):
    if math.isnan(value):
        value = (
            median_table[
                (median_table['event_id'] == event_id) & 
                (median_table['category'] == category) & 
                (median_table['rounds_bot_date'] == rounds_bot_date)]['original_eur_median'].values[0]
        )
        return value
    else:
        return value


dataset['original_eur'] = (
    dataset
    .apply(
        lambda x: fill_na(x['original_eur'], x['event_id'], x['category'], x['rounds_bot_date'], median_table),
        axis = 1)
)

Is there any way to optimize this code and simulate median window function in Pandas?

P. S. I an make iterrows with same logic, but it's not fast as SQL-function.

Solution:

# add new column with median values
dataset['original_eur_median'] = (
    dataset
    .groupby(['event_id', 'category', 'rounds_bot_date'])['original_eur']
    .transform('median')
)
# fill NaN with median values.
dataset['original_eur'] = dataset['original_eur'].fillna(dataset['original_eur_median'])

Solution

Since ur code is kinda complicated and there is no ideal output shown, I guess ur task can be done using pandas.DataFrame.groupby().transform(). Try this:

df['fixed_eur'] = df.groupby(['event_id', 'category', 'rounds_bot_date'])['original_eur'].transform(lambda x: x.fillna(x.median()))

Note: When using Pandas, Numpy, Polars and alike Python libs, keep in mind that:

The 99.99% of our problems have been faced and solved by other peoples before, so check the API reference first, where u usually could find the ready answer;
Self-made functions always perform much worse than built-in ones, so plz avoid them. If u have to, at least avoid loops in them.
If u r familiar with SQL, I recommend Polars other than Pandas for u. I believe u will quickly get used to it.