Search code examples
pythonpandasdataframepandas-melt

Pivot data in pandas using melt and unstack


Given the following data:

data = pd.DataFrame(
    {
        "A": ["a", "a", "b", "b"],
        "B": ["x", "y", "p", "q"],
        "C": ["one", "two", "one", "two"],
    }
)

which looks as:

   A  B    C
0  a  x  one
1  a  y  two
2  b  p  one
3  b  q  two

I would like to create the following:

data_out = pd.DataFrame(
    {
        "A": ["a", "b"],
        "one": ["x", "p"],
        "two": ["y", "q"],
    }
)

which looks as:

   A one two
0  a   x   y
1  b   p   q

I'm aware that I could do something along the lines of:

d_piv = pd.pivot_table(
    data,
    index=["A"],
    columns=["C"],
    values=["B"],
    aggfunc=lambda x: x,
).reset_index()

which gives:

   A   B    
C    one two
0  a   x   y
1  b   p   q

from which the columns could be cleaned up, but I'm wondering how I'd go about solving this using melt and unstack?

I have tried:

print(data.set_index("C", append=True).unstack())

which gives:

     A         B     
C  one  two  one  two
0    a  NaN    x  NaN
1  NaN    a  NaN    y
2    b  NaN    p  NaN
3  NaN    b  NaN    q

The NaN values aren't wanted here, so I could instead try:

data.index = [0, 0, 1, 1]
data.set_index(["A", "C"], append=True).unstack(-1).reset_index(level=-1)

which gives:

   A   B    
C    one two
0  a   x   y
1  b   p   q

So that's closer - but it still feels as though there's still some unnecessary bits there.

Particularly coding the index like that.

Edit

Solution of :

df.set_index('A').pivot(columns='C', values='B').reset_index().rename_axis(None, axis=1)

is good, but I am wondering whether unstack can be used here instead of pivot?


Solution

  • First, set A column as the index then use df.pivot. To get the exact output we have to reset index and rename axis.

    (df.set_index("A").pivot(columns="C", values="B")
                      .reset_index()
                      .rename_axis(None, axis=1))
    
    
       A one two
    0  a   x   y
    1  b   p   q
    

    Using df.unstack

    df.set_index(["A", "C"])["B"].unstack().reset_index().rename_axis(None, axis=1)
    
       A one two
    0  a   x   y
    1  b   p   q