Search code examples
pythonpandasmatrixdata-cleaning

pandas dataframe contain list


suppose i have a dataframe like this :

lst1 = [[1,3,4,5],[1,2,3,3],[2,3,4,5],[3,4,5,5]]
lst2 = [[1,2,3,1],[1,4,1,2],[3,3,1,5],[2,4,1,5]]
lst3 = [[1,2,3,3],[3,2,1,2],[1,3,1,4],[2,4,3,5]]
percentile_list = pd.DataFrame({'lst1Tite' : lst1,
 'lst2Tite' : lst2,
 'lst3Tite':lst3
})

> precentile_list    
        lst1Tite    lst2Tite    lst3Tite
0   [1, 3, 4, 5]    [1, 2, 3, 1]    [1, 2, 3, 3]
1   [1, 2, 3, 3]    [1, 4, 1, 2]    [3, 2, 1, 2]
2   [2, 3, 4, 5]    [3, 3, 1, 5]    [1, 3, 1, 4]
3   [3, 4, 5, 5]    [2, 4, 1, 5]    [2, 4, 3, 5]

Now I want to extract row 0, and turn row 0 as a dataframe like this:

> percentile_0
col1    col2    col3    col4
0   1   3   4   5
1   1   2   3   1
2   1   2   3   3

How can i do that?

And what if I want to turn precentile_list to a dataframe like percentile_0 ?


Solution

  • You can use apply and apply the Series ctor on the row:

    In [17]:
    percentile_list.iloc[0].apply(pd.Series)
    
    Out[17]:
              0  1  2  3
    lst1Tite  1  3  4  5
    lst2Tite  1  2  3  1
    lst3Tite  1  2  3  3
    

    If you're particularly enamoured with the desired output:

    In [20]:
    pd.DataFrame(percentile_list.iloc[0].apply(pd.Series).values, columns = ['col1','col2','col3','col4'])
    
    Out[20]:
       col1  col2  col3  col4
    0     1     3     4     5
    1     1     2     3     1
    2     1     2     3     3
    

    You can store each df in a dict with a key named as you desire:

    In [41]:
    d={}
    for l in percentile_list.index:
        d['percentile_' + str(l)] = pd.DataFrame(percentile_list.loc[l].apply(pd.Series).values, columns = ['col1','col2','col3','col4'])
    d
    
    Out[41]:
    {'percentile_0':    col1  col2  col3  col4
     0     1     3     4     5
     1     1     2     3     1
     2     1     2     3     3, 'percentile_1':    col1  col2  col3  col4
     0     1     2     3     3
     1     1     4     1     2
     2     3     2     1     2, 'percentile_2':    col1  col2  col3  col4
     0     2     3     4     5
     1     3     3     1     5
     2     1     3     1     4, 'percentile_3':    col1  col2  col3  col4
     0     3     4     5     5
     1     2     4     1     5
     2     2     4     3     5}
    

    Here is the first key:

    In [42]:
    d['percentile_0']
    
    Out[42]:
       col1  col2  col3  col4
    0     1     3     4     5
    1     1     2     3     1
    2     1     2     3     3