I melted a pandas dataframe for plotting use with ggplot (which often requires long form of dataframes), as follows:
test = pandas.melt(iris, id_vars=["Name"], value_vars=["SepalLength", "SepalWidth"])
This keeps the Name
field of the iris dataset in the index, but transforms the columns SepalLength
and SepalWidth
into long form:
test.ix[0:10]
Out:
Name variable value
0 Iris-setosa SepalLength 5.1
1 Iris-setosa SepalLength 4.9
2 Iris-setosa SepalLength 4.7
3 Iris-setosa SepalLength 4.6
4 Iris-setosa SepalLength 5.0
5 Iris-setosa SepalLength 5.4
6 Iris-setosa SepalLength 4.6
7 Iris-setosa SepalLength 5.0
8 Iris-setosa SepalLength 4.4
9 Iris-setosa SepalLength 4.9
10 Iris-setosa SepalLength 5.4
How can I "unmelt" this dataframe back? I want the Name
column to be kept, but the values of variable
field to be transformed into separate columns. The Name
field is not unique, so I don't think it can be used as an index. My impression was that pivot
is the right function to do this but it is not right:
test.pivot(columns="variable", values="value")
KeyError: u'no item named '
How could I do this? Also, could I unmelt dataframes where there are multiple columns that are in long form, i.e. multiple columns in test
that are like the variable
column above? It would mean that the columns
will have to accept a list of columns, not a single value, it seems. thanks.
I think this situation is ambiguous since the test
dataframe doesn't have an index that identifies each unique row. If melt
simply stacked the rows with value_vars
SepalLength and SepalWidth, then you can manually create an index to pivot on; and it looks like the result ends up the same as the original:
In [15]: test['index'] = range(len(test) / 2) * 2
In [16]: test[:10]
Out[16]:
Name variable value index
0 Iris-setosa SepalLength 5.1 0
1 Iris-setosa SepalLength 4.9 1
2 Iris-setosa SepalLength 4.7 2
3 Iris-setosa SepalLength 4.6 3
4 Iris-setosa SepalLength 5.0 4
5 Iris-setosa SepalLength 5.4 5
6 Iris-setosa SepalLength 4.6 6
7 Iris-setosa SepalLength 5.0 7
8 Iris-setosa SepalLength 4.4 8
9 Iris-setosa SepalLength 4.9 9
In [17]: test[-10:]
Out[17]:
Name variable value index
290 Iris-virginica SepalWidth 3.1 140
291 Iris-virginica SepalWidth 3.1 141
292 Iris-virginica SepalWidth 2.7 142
293 Iris-virginica SepalWidth 3.2 143
294 Iris-virginica SepalWidth 3.3 144
295 Iris-virginica SepalWidth 3.0 145
296 Iris-virginica SepalWidth 2.5 146
297 Iris-virginica SepalWidth 3.0 147
298 Iris-virginica SepalWidth 3.4 148
299 Iris-virginica SepalWidth 3.0 149
In [18]: df = test.pivot(index='index', columns='variable', values='value')
In [19]: df['Name'] = test['Name']
In [20]: df[:10]
Out[20]:
variable SepalLength SepalWidth Name
index
0 5.1 3.5 Iris-setosa
1 4.9 3.0 Iris-setosa
2 4.7 3.2 Iris-setosa
3 4.6 3.1 Iris-setosa
4 5.0 3.6 Iris-setosa
5 5.4 3.9 Iris-setosa
6 4.6 3.4 Iris-setosa
7 5.0 3.4 Iris-setosa
8 4.4 2.9 Iris-setosa
9 4.9 3.1 Iris-setosa
In [21]: (iris[["SepalLength", "SepalWidth", "Name"]] == df[["SepalLength", "SepalWidth", "Name"]]).all()
Out[21]:
SepalLength True
SepalWidth True
Name True