python pandas dataframe lambda increment

Fill NAs values with combination of incremental number and column value

I have the following dataframe:

        UNIQUE_IDENTIFIER   COUNTRY_CODE
        
0          1                    CZ
1         nan                   CZ
2          2                    SK
3          4                    AE
4        nan                    DK
5        nan                    CZ
6        nan                    DK
7        nan                    ES

For all blank values in the "UNIQUE_IDENTIFIER" column, I would like to create a value that takes the "COUNTRY_CODE" and add incremental numbers (with a space in between the number and the Country Code) starting from 1 for each different country code. So the final dataframe would be this:

        UNIQUE_IDENTIFIER   COUNTRY_CODE
        
0          1                    CZ
1         CZ 1                  CZ
2          2                    SK
3          4                    AE
4        DK 1                   DK
5        CZ 2                   CZ
6        DK 2                   DK
7        ES 1                   ES

What would be the best way to do it?

Solution

Use GroupBy.cumcount only for missing rows by UNIQUE_IDENTIFIER and add COUNTRY_CODE values with space separator:

m = df.UNIQUE_IDENTIFIER.isna()
s = df[m].groupby('COUNTRY_CODE').cumcount().add(1).astype(str)
df.loc[m, 'UNIQUE_IDENTIFIER'] = df.loc[m, 'COUNTRY_CODE'] + ' ' + s
print (df)
  UNIQUE_IDENTIFIER COUNTRY_CODE
0               1.0           CZ
1              CZ 1           CZ
2               2.0           SK
3               4.0           AE
4              DK 1           DK
5              CZ 2           CZ
6              DK 2           DK
7              ES 1           ES

Or use Series.fillna for replace missing values:

s = df[df.UNIQUE_IDENTIFIER.isna()].groupby('COUNTRY_CODE').cumcount().add(1).astype(str)
df['UNIQUE_IDENTIFIER'] = df['UNIQUE_IDENTIFIER'].fillna(df['COUNTRY_CODE'] + ' ' + s)
print (df)
  UNIQUE_IDENTIFIER COUNTRY_CODE
0               1.0           CZ
1              CZ 1           CZ
2               2.0           SK
3               4.0           AE
4              DK 1           DK
5              CZ 2           CZ
6              DK 2           DK
7              ES 1           ES

Details:

print (df[m].groupby('COUNTRY_CODE').cumcount().add(1).astype(str))
1    1
4    1
5    2
6    2
7    1
dtype: object