Search code examples
pandasgroup-bynanautofill

Fill nan Pandas DF with N number of Column Numbers from Different DF


I have a df "df1" that looks like this below and I need to fill the nan values using a groupby "plant_name" of values in the "n" number of columns in a different df as shown below.

Index   Adj_Prod    Adj_Prod    Adj_Prod    Adj_Prod    Adj_Prod    month   plant_name  year
3455    63285.13821 63285.13821 63285.13821 63285.13821 63285.13821 12  PENASCAL II     2021
3464    52758.13661 52758.13661 52758.13661 52758.13661 52758.13661 1   PENASCAL II     2022
3473    55998.67419 55998.67419 55998.67419 55998.67419 55998.67419 2   PENASCAL II     2022
3482    68582.45954 68582.45954 68582.45954 68582.45954 68582.45954 3   PENASCAL II     2022
3491    97313.92303 97313.92303 97313.92303 97313.92303 97313.92303 4   PENASCAL II     2022
3500    106054.0829 106054.0829 106054.0829 106054.0829 106054.0829 5   PENASCAL II     2022
3509    70424.47176 70424.47176 70424.47176 70424.47176 70424.47176 6   PENASCAL II     2022
3518    nan         nan         nan         nan         nan         7   PENASCAL II     2022
3527    nan         nan         nan         nan         nan         8   PENASCAL II     2022
3536    nan         nan         nan         nan         nan         9   PENASCAL II     2022
3545    nan         nan         nan         nan         nan        10   PENASCAL II     2022
3554    nan         nan         nan         nan         nan        11   PENASCAL II     2022
3563    nan         nan         nan         nan         nan        12   PENASCAL II     2022

Here is the other dataframe "df2" that I need to use to fill the nan values that needs to be grouped by "plant_name" but I'm not sure how to do that by column numbers that could change - in this example, there are 5 columns as shown here:

Index   month   plant_name  0       1           2           3           4
46      11  PENASCAL I  57024.37    85799.06    56423.82    44967.31    62426.29
47      12  PENASCAL I  72072.84    61719.23    74177.79    53048.06    61513.94
48       7  PENASCAL II 56188.81    64556.23    74918.13    72951.01    57474.33
49       8  PENASCAL II 31309.33    38571.34    61658.58    38578.86    52948.55
50       9  PENASCAL II 29783.46    39220.07    38641.02    35055.39    33024.38
51      10  PENASCAL II 65961.29    38898.14    55066.84    30100.4     65961.29
52      11  PENASCAL II 55134.4     49616.31    50353.2     48451.29    51903.16
53      12  PENASCAL II 62738.47    61756.62    60691.09    54747.75    48753.57

The final result should look like this below:

        Adj_Prod    Adj_Prod    Adj_Prod    Adj_Prod    Adj_Prod    month   plant_name  year
3455    63285.13821 63285.13821 63285.13821 63285.13821 63285.13821 12      PENASCAL II 2021
3464    52758.13661 52758.13661 52758.13661 52758.13661 52758.13661 1       PENASCAL II 2022
3473    55998.67419 55998.67419 55998.67419 55998.67419 55998.67419 2       PENASCAL II 2022
3482    68582.45954 68582.45954 68582.45954 68582.45954 68582.45954 3       PENASCAL II 2022
3491    97313.92303 97313.92303 97313.92303 97313.92303 97313.92303 4       PENASCAL II 2022
3500    106054.0829 106054.0829 106054.0829 106054.0829 106054.0829 5       PENASCAL II 2022
3509    70424.47176 70424.47176 70424.47176 70424.47176 70424.47176 6       PENASCAL II 2022
3518    56188.81    64556.23    74918.13    72951.01    57474.33    7       PENASCAL II 2022
3527    31309.33    38571.34    61658.58    38578.86    52948.55    8       PENASCAL II 2022
3536    29783.46    39220.07    38641.02    35055.39    33024.38    9       PENASCAL II 2022
3545    65961.29    38898.14    55066.84    30100.4     65961.29    10      PENASCAL II 2022
3554    55134.4     49616.31    50353.2     48451.29    51903.16    11      PENASCAL II 2022
3563    62738.47    61756.62    60691.09    54747.75    48753.57    12      PENASCAL II 2022

I think that a groupby "plant_name" and then the value using the column number would work but I'm not sure how to do that since the column numbers will change based on the value of "n" which is 5 in this example. thank you!

I have tried something like this but do not know how to specify the columns and how to address say 500 columns 0, 1, ...500 for example.

df1.fillna(df2.groupby(['plant_name'])['0','1','2','3','4']. 

This gives me an error:

KeyError: "Columns not found: '2', '3', '1', '0', '4'"

I have also tried this but does not work:

df1.fillna(df2.groupby(['plant_name'])[list(range(5))]))

also, this does not work:

df1.groupby(['plant_name'])['Adj_Prod'].fillna(df2.iloc[:,2:6])

Solution

  • With the dataframes you provided, I suggest a different approach:

    # Rename df1 columns so as to be unique
    df1.columns = [
        col if not col.startswith("Adj") else f"Adj_Prod{i}"
        for i, col in enumerate(df1.columns)
    ]
    
    # Rename df2 columns to match df1 columns names
    df2.columns = [col if not col.isnumeric() else f"Adj_Prod{col}" for col in df2.columns]
    
    
    # Concat non nan rows of df1 with matching of rows
    # of df2 (modified to use df1 nan rows index and year column values)
    df = pd.concat(
        [
            df1.dropna(
                how="all", subset=[col for col in df1.columns if col.startswith("Adj")]
            ),
            df2[
                (df2["month"].isin(df1["month"]))
                & (df2["plant_name"].isin(df1["plant_name"]))
            ]
            .set_index(df1[df1.isna().any(axis=1)].index)
            .assign(year=df1.loc[df1.isna().any(axis=1), "year"]),
        ]
    )
    
    # Rename df columns to be identical
    df.columns = [col if not col.startswith("Adj") else "Adj_Prod" for col in df.columns]
    
    print(df)
    # Output
              Adj_Prod     Adj_Prod     Adj_Prod     Adj_Prod     Adj_Prod  month   plant_name  year
    Index
    3455   63285.13821  63285.13821  63285.13821  63285.13821  63285.13821     12  PENASCAL II  2021
    3464   52758.13661  52758.13661  52758.13661  52758.13661  52758.13661      1  PENASCAL II  2022
    3473   55998.67419  55998.67419  55998.67419  55998.67419  55998.67419      2  PENASCAL II  2022
    3482   68582.45954  68582.45954  68582.45954  68582.45954  68582.45954      3  PENASCAL II  2022
    3491   97313.92303  97313.92303  97313.92303  97313.92303  97313.92303      4  PENASCAL II  2022
    3500   106054.0829  106054.0829  106054.0829  106054.0829  106054.0829      5  PENASCAL II  2022
    3509   70424.47176  70424.47176  70424.47176  70424.47176  70424.47176      6  PENASCAL II  2022
    3518      56188.81     64556.23     74918.13     72951.01     57474.33      7  PENASCAL II  2022
    3527      31309.33     38571.34     61658.58     38578.86     52948.55      8  PENASCAL II  2022
    3536      29783.46     39220.07     38641.02     35055.39     33024.38      9  PENASCAL II  2022
    3545      65961.29     38898.14     55066.84      30100.4     65961.29     10  PENASCAL II  2022
    3554       55134.4     49616.31      50353.2     48451.29     51903.16     11  PENASCAL II  2022
    3563      62738.47     61756.62     60691.09     54747.75     48753.57     12  PENASCAL II  2022