Search code examples
pythonparsingpandasunpivot

Parse/unpivot data into the column in pandas dataframe


I have a pandas DataFrame with the following structure:

|A|B|C      |
-------------
|1|2|"1,2,4"|
|3|4|"5,6"  |

What is the most pythonic way to get a table like below?

|A|B|C|
-------
|1|2|1|
|1|2|2|
|1|2|4|
|3|4|5|
|3|4|6|

The initial table contains ~10K instances with 4-18 elements in C.

For the moment I have nothing brighter than:

import pandas as pd
df = pd.DataFrame([[1, 2, "1,2,4"], [3, 4, "5,6"]], columns=['A', 'B', 'C'])
tmp = df['C'].str.split(',').apply(pd.Series, 1).stack()
tmp.index = tmp.index.droplevel(-1)
tmp.name = 'C'
del df['C']
df = df.join(tmp)

Solution

  • Not much better. I did it two ways. The first is similar to yours with some cosmetic differences and feels like the better of the two. I still added the second because it's a bit different in that it concats the series returned by split.

    >>> import pandas as pd
    >>> df = pd.DataFrame([[1, 2, "1,2,4"], [3, 4, "5,6"]], columns=['A', 'B', 'C'])
    >>> x = df.drop('C', axis=1).join(df.C.str.split(',', expand=True).stack().to_frame('C').reset_index(1, drop=True))
    >>> print x
       A  B  C
    0  1  2  1
    0  1  2  2
    0  1  2  4
    1  3  4  5
    1  3  4  6
    
    >>> y = df.drop('C', axis=1).join(pd.concat(df.C.str.split(',', expand=True).to_dict('series').values()).dropna().to_frame('C'))
    >>> print y
       A  B  C
    0  1  2  1
    0  1  2  2
    0  1  2  4
    1  3  4  5
    1  3  4  6