Pandas: how to calculate percentage of a population from elsewhere

I found this data file on covid vaccinations, and I'd like to see the vaccination coverage in (parts of) the population. It'll probably become more clear with the actual example, so bear with me.

If I read the csv using df = pd.read_csv('https://epistat.sciensano.be/Data/COVID19BE_VACC.csv', parse_dates=['DATE']) I get this result:

        DATE    REGION AGEGROUP SEX            BRAND DOSE  COUNT
0 2020-12-28  Brussels    18-34   F  Pfizer-BioNTech    A      1
1 2020-12-28  Brussels    45-54   F  Pfizer-BioNTech    A      2
2 2020-12-28  Brussels    55-64   F  Pfizer-BioNTech    A      3
3 2020-12-28  Brussels    55-64   M  Pfizer-BioNTech    A      1
4 2020-12-28  Brussels    65-74   F  Pfizer-BioNTech    A      2

I'm particularly interested in the numbers by region & date.
So I regrouped using df.groupby(['REGION','DATE']).sum()

                     COUNT
REGION   DATE             
Brussels 2020-12-28     56
         2020-12-30      5
         2021-01-05    725
         2021-01-06    989
         2021-01-07    994
...                    ...
Wallonia 2021-06-18  49567
         2021-06-19  43577
         2021-06-20   2730
         2021-06-21  37193
         2021-06-22  16938

In order to compare vaccination 'speeds' in different regions I have to transform the data from absolute to relative numbers, using the population from each region.

I have found some posts explaining how to calculate percentages in a multi-index dataframe like this, but the problem is that I want to divide each COUNT by a population number that is not in the original dataframe.

The population numbers are here below

REGION     POP
Flanders   6629143
Wallonia   3645243
Brussels   1218255

I think the solution must be in looping through the original df and checking both REGIONs or index levels, but I have absolutely no idea how. It's a technique I'd like to master, because it might come in handy when I want some other subsets with different populations (AGEGROUP or SEX maybe).

Thank you so much for reading this far!

Disclaimer: I've only just started out using Python, and this is my very first question on Stack Overflow, so please be gentle with me... The reason why I'm posting this is because I can't find an answer anywhere else. This is probably because I haven't got the terminology down and I don't exactly know what to look for ^_^

Solution

One option would be to reformat the population_df with set_index + rename:

population_df = pd.DataFrame({
    'REGION': {0: 'Flanders', 1: 'Wallonia', 2: 'Brussels'},
    'POP': {0: 6629143, 1: 3645243, 2: 1218255}
})

denom = population_df.set_index('REGION').rename(columns={'POP': 'COUNT'})

denom:

            COUNT
REGION           
Flanders  6629143
Wallonia  3645243
Brussels  1218255

Then div the results of groupby sum relative to level=0:

new_df = df.groupby(['REGION', 'DATE']).agg({'COUNT': 'sum'}).div(denom, level=0)

new_df:

                        COUNT
REGION   DATE                
Brussels 2020-12-28  0.000046
         2020-12-30  0.000004
         2021-01-05  0.000595
         2021-01-06  0.000812
         2021-01-07  0.000816
...                       ...
Wallonia 2021-06-18  0.013598
         2021-06-19  0.011954
         2021-06-20  0.000749
         2021-06-21  0.010203
         2021-06-22  0.004647

Or as a new column:

new_df = df.groupby(['REGION', 'DATE']).agg({'COUNT': 'sum'})
new_df['NEW'] = new_df.div(denom, level=0)

new_df:

                     COUNT       NEW
REGION   DATE                       
Brussels 2020-12-28     56  0.000046
         2020-12-30      5  0.000004
         2021-01-05    725  0.000595
         2021-01-06    989  0.000812
         2021-01-07    994  0.000816
...                    ...       ...
Wallonia 2021-06-18  49567  0.013598
         2021-06-19  43577  0.011954
         2021-06-20   2730  0.000749
         2021-06-21  37193  0.010203
         2021-06-22  16938  0.004647