Search code examples
python-3.xpandasdataframepandas-melt

Grouping several dataframe columns based on another columns values


I have this dataframe:

    refid   col2    price1  factor1 price2  factor2 price3  factor3
  0   1       a      200       1      180     3     150       10
  1   2       b      500       1      450     3     400       10
  2   3       c      700       1      620     2     550        5

And I need to get this output:

   refid    col2    price   factor
0   1        a      200       1
1   1        b      500       1
2   1        c      700       1
3   2        a      180       3
4   2        b      450       3
5   2        c      620       2
6   3        a      150       10
7   3        b      400       10
8   3        c      550       5

Right now I'm trying to use df.melt method, but can't get it to work, this is the code and the current result:

df2_melt = df2.melt(id_vars=["refid","col2"],
        value_vars=["price1","price2","price3",
                   "factor1","factor2","factor3"],
        var_name="Price", 
        value_name="factor")



    refid   col2    price   factor
0       1   a      price1   200
1       2   b      price1   500
2       3   c      price1   700
3       1   a      price2   180
4       2   b      price2   450
5       3   c      price2   620
6       1   a      price3   150
7       2   b      price3   400
8       3   c      price3   550
9       1   a      factor1  1
10      2   b      factor1  1
11      3   c      factor1  1
12      1   a      factor2  3
13      2   b      factor2  3
14      3   c      factor2  2
15      1   a      factor3  10
16      2   b      factor3  10
17      3   c      factor3  5

Solution

  • Since you have a wide DataFrame with common prefixes, you can use wide_to_long:

    out = pd.wide_to_long(df, stubnames=['price','factor'], 
                          i=["refid","col2"], j='num').droplevel(-1).reset_index()
    

    Output:

       refid col2  price  factor
    0      1    a    200       1
    1      1    a    180       3
    2      1    a    150      10
    3      2    b    500       1
    4      2    b    450       3
    5      2    b    400      10
    6      3    c    700       1
    7      3    c    620       2
    8      3    c    550       5
    

    Note that your expected output has an error where factors don't align with refids.