Search code examples
pythonpandasdataframelambdaincrement

Fill NAs values with combination of incremental number and column value


I have the following dataframe:

        UNIQUE_IDENTIFIER   COUNTRY_CODE
        
0          1                    CZ
1         nan                   CZ
2          2                    SK
3          4                    AE
4        nan                    DK
5        nan                    CZ
6        nan                    DK
7        nan                    ES

For all blank values in the "UNIQUE_IDENTIFIER" column, I would like to create a value that takes the "COUNTRY_CODE" and add incremental numbers (with a space in between the number and the Country Code) starting from 1 for each different country code. So the final dataframe would be this:

        UNIQUE_IDENTIFIER   COUNTRY_CODE
        
0          1                    CZ
1         CZ 1                  CZ
2          2                    SK
3          4                    AE
4        DK 1                   DK
5        CZ 2                   CZ
6        DK 2                   DK
7        ES 1                   ES

What would be the best way to do it?


Solution

  • Use GroupBy.cumcount only for missing rows by UNIQUE_IDENTIFIER and add COUNTRY_CODE values with space separator:

    m = df.UNIQUE_IDENTIFIER.isna()
    s = df[m].groupby('COUNTRY_CODE').cumcount().add(1).astype(str)
    df.loc[m, 'UNIQUE_IDENTIFIER'] = df.loc[m, 'COUNTRY_CODE'] + ' ' + s
    print (df)
      UNIQUE_IDENTIFIER COUNTRY_CODE
    0               1.0           CZ
    1              CZ 1           CZ
    2               2.0           SK
    3               4.0           AE
    4              DK 1           DK
    5              CZ 2           CZ
    6              DK 2           DK
    7              ES 1           ES
    

    Or use Series.fillna for replace missing values:

    s = df[df.UNIQUE_IDENTIFIER.isna()].groupby('COUNTRY_CODE').cumcount().add(1).astype(str)
    df['UNIQUE_IDENTIFIER'] = df['UNIQUE_IDENTIFIER'].fillna(df['COUNTRY_CODE'] + ' ' + s)
    print (df)
      UNIQUE_IDENTIFIER COUNTRY_CODE
    0               1.0           CZ
    1              CZ 1           CZ
    2               2.0           SK
    3               4.0           AE
    4              DK 1           DK
    5              CZ 2           CZ
    6              DK 2           DK
    7              ES 1           ES
    

    Details:

    print (df[m].groupby('COUNTRY_CODE').cumcount().add(1).astype(str))
    1    1
    4    1
    5    2
    6    2
    7    1
    dtype: object