Search code examples
pythonpandasdataframedatatablemultiple-columns

Duplicating rows in pandas Python


i hope you are doing good . I have the following output :

ClassName   Bugs   HighBugs  LowBugs  NormalBugs  WMC   LOC

 Class1      4        0        1         3        34     77 
 Class2      0        0        0         0        9      45
 Class3      3        0        1         2        10     18
 Class4      0        0        0         0        44     46
 Class5      6        2        2         2        78     94

The result i want is as follow :

ClassName   Bugs   HighBugs  LowBugs  NormalBugs  WMC   LOC

 Class1      1        0        0         1        34     77
 Class1      1        0        0         1        34     77
 Class1      1        0        0         1        34     77
 Class1      1        0        1         0        34     77
 Class2      0        0        0         0        9      45
 Class3      1        0        0         1        10     18
 Class3      1        0        0         1        10     18
 Class3      1        0        1         0        10     18
 Class4      0        0        0         0        44     46
 Class5      1        0        0         1        78     94
 Class5      1        0        0         1        78     94
 Class5      1        0        1         0        78     94
 Class5      1        0        1         0        78     94
 Class5      1        1        0         0        78     94
 Class5      1        1        0         0        78     94

Little explanation , what i want is to duplicate the classes depending on the column Bugs and Bugs = HighBugs + LowBugs + NormalBugs , as you can see in the result i want is that when the classes are duplicated we have only one's and zero's depending on the number of Bugs.

Thank you in advance and have a good day you all .


Solution

  • Try:

    dfs, col_names, other_cols = (
        [],
        ["NormalBugs", "LowBugs", "HighBugs"],
        ["ClassName", "WMC", "LOC"],
    )
    for _, row in df.iterrows():
        if row["Bugs"] == 0:
            dfs.append(
                pd.DataFrame(
                    [[0, 0, 0, *[row[c] for c in other_cols]]],
                    columns=col_names + other_cols,
                )
            )
    
        else:
            for c in col_names:
                dfs.append(pd.DataFrame([1] * row[c], columns=[c]))
                for oc in other_cols:
                    dfs[-1][oc] = row[oc]
    
    
    df_out = pd.concat(dfs).fillna(0)
    df_out[col_names] = df_out[col_names].astype(int)
    df_out["Bugs"] = df_out[col_names].any(axis=1).astype(int)
    print(
        df_out[
            ["ClassName", "Bugs", "HighBugs", "LowBugs", "NormalBugs", "WMC", "LOC"]
        ]
    )
    

    Prints:

      ClassName  Bugs  HighBugs  LowBugs  NormalBugs  WMC  LOC
    0    Class1     1         0        0           1   34   77
    1    Class1     1         0        0           1   34   77
    2    Class1     1         0        0           1   34   77
    0    Class1     1         0        1           0   34   77
    0    Class2     0         0        0           0    9   45
    0    Class3     1         0        0           1   10   18
    1    Class3     1         0        0           1   10   18
    0    Class3     1         0        1           0   10   18
    0    Class4     0         0        0           0   44   46
    0    Class5     1         0        0           1   78   94
    1    Class5     1         0        0           1   78   94
    0    Class5     1         0        1           0   78   94
    1    Class5     1         0        1           0   78   94
    0    Class5     1         1        0           0   78   94
    1    Class5     1         1        0           0   78   94
    

    EDIT: Added more columns.