Search code examples
pythonpandaspivotreshapedata-munging

Python pandas: pivot certain variables in wide-form DataFrame


Data munging question: how can I selectively pivot certain variables from an overly-wide DataFrame?

For example, I would like to turn:

df1 = pd.DataFrame(
    [[1,'a','b',.1,-1,10],
     [2,'a','b',.2,-3,12],
     [3,'c','d',.3,-5,14]],
    columns=['sample','id1','id2','x','y1','y2'])
print df1
#   sample id1 id2    x  y1  y2
#0       1   a   b  0.1  -1  10
#1       2   a   b  0.2  -3  12
#2       3   c   d  0.3  -5  14

into:

#   sample id  position    x   y
#0       1  a         1  0.1  -1
#1       1  b         2  0.1  10
#2       2  a         1  0.2  -3
#3       2  b         2  0.2  12
#4       3  c         1  0.3  -5
#5       3  d         2  0.3  14

Note that x is copied, and y aligns with position.

A straight pd.melt() creates mixed variables and data types which don't easily pivot selectively back into wide-form.

print pd.melt(df1, id_vars='sample')
#    sample variable value
#0        1      id1     a
#1        2      id1     a
#2        3      id1     c
#3        1      id2     b
#4        2      id2     b
#5        3      id2     d
#6        1        x   0.1
#7        2        x   0.2
#8        3        x   0.3
#9        1       y1    -1
#10       2       y1    -3
#11       3       y1    -5
#12       1       y2    10
#13       2       y2    12
#14       3       y2    14

Any advice? Thanks!


Solution

  • You can try this:

    # set columns that don't change as index
    df1.set_index(['sample', 'x'], inplace=True)
    
    # create multi-index columns based on the names pattern
    df1.columns = pd.MultiIndex.from_arrays(df1.columns.str.extract(r"(\D+)(\d+)", expand=True).T.values)
    
    # transform the multi-index data frames to long format with stack
    df1.stack(level=1).rename_axis(('sample', 'x', 'position')).reset_index()
    

    enter image description here