Search code examples
pythonnumpypandasdataformdataframe

unmelting a pandas dataframe in Python?


I melted a pandas dataframe for plotting use with ggplot (which often requires long form of dataframes), as follows:

test = pandas.melt(iris, id_vars=["Name"], value_vars=["SepalLength", "SepalWidth"])

This keeps the Name field of the iris dataset in the index, but transforms the columns SepalLength and SepalWidth into long form:

test.ix[0:10]
Out:
           Name     variable  value
0   Iris-setosa  SepalLength    5.1
1   Iris-setosa  SepalLength    4.9
2   Iris-setosa  SepalLength    4.7
3   Iris-setosa  SepalLength    4.6
4   Iris-setosa  SepalLength    5.0
5   Iris-setosa  SepalLength    5.4
6   Iris-setosa  SepalLength    4.6
7   Iris-setosa  SepalLength    5.0
8   Iris-setosa  SepalLength    4.4
9   Iris-setosa  SepalLength    4.9
10  Iris-setosa  SepalLength    5.4

How can I "unmelt" this dataframe back? I want the Name column to be kept, but the values of variable field to be transformed into separate columns. The Name field is not unique, so I don't think it can be used as an index. My impression was that pivot is the right function to do this but it is not right:

test.pivot(columns="variable", values="value")
KeyError: u'no item named '

How could I do this? Also, could I unmelt dataframes where there are multiple columns that are in long form, i.e. multiple columns in test that are like the variable column above? It would mean that the columns will have to accept a list of columns, not a single value, it seems. thanks.


Solution

  • I think this situation is ambiguous since the test dataframe doesn't have an index that identifies each unique row. If melt simply stacked the rows with value_vars SepalLength and SepalWidth, then you can manually create an index to pivot on; and it looks like the result ends up the same as the original:

    In [15]: test['index'] = range(len(test) / 2) * 2
    In [16]: test[:10]
    Out[16]: 
              Name     variable  value  index
    0  Iris-setosa  SepalLength    5.1      0
    1  Iris-setosa  SepalLength    4.9      1
    2  Iris-setosa  SepalLength    4.7      2
    3  Iris-setosa  SepalLength    4.6      3
    4  Iris-setosa  SepalLength    5.0      4
    5  Iris-setosa  SepalLength    5.4      5
    6  Iris-setosa  SepalLength    4.6      6
    7  Iris-setosa  SepalLength    5.0      7
    8  Iris-setosa  SepalLength    4.4      8
    9  Iris-setosa  SepalLength    4.9      9
    
    In [17]: test[-10:]
    Out[17]: 
                   Name    variable  value  index
    290  Iris-virginica  SepalWidth    3.1    140
    291  Iris-virginica  SepalWidth    3.1    141
    292  Iris-virginica  SepalWidth    2.7    142
    293  Iris-virginica  SepalWidth    3.2    143
    294  Iris-virginica  SepalWidth    3.3    144
    295  Iris-virginica  SepalWidth    3.0    145
    296  Iris-virginica  SepalWidth    2.5    146
    297  Iris-virginica  SepalWidth    3.0    147
    298  Iris-virginica  SepalWidth    3.4    148
    299  Iris-virginica  SepalWidth    3.0    149
    
    In [18]: df = test.pivot(index='index', columns='variable', values='value')
    In [19]: df['Name'] = test['Name']
    In [20]: df[:10]
    Out[20]: 
    variable  SepalLength  SepalWidth         Name
    index                                         
    0                 5.1         3.5  Iris-setosa
    1                 4.9         3.0  Iris-setosa
    2                 4.7         3.2  Iris-setosa
    3                 4.6         3.1  Iris-setosa
    4                 5.0         3.6  Iris-setosa
    5                 5.4         3.9  Iris-setosa
    6                 4.6         3.4  Iris-setosa
    7                 5.0         3.4  Iris-setosa
    8                 4.4         2.9  Iris-setosa
    9                 4.9         3.1  Iris-setosa
    
    In [21]: (iris[["SepalLength", "SepalWidth", "Name"]] == df[["SepalLength", "SepalWidth", "Name"]]).all()
    Out[21]: 
    SepalLength    True
    SepalWidth     True
    Name           True