Search code examples
pythonpython-3.xpandasjupyter-notebooksklearn-pandas

Splitting strings of tuples of different lengths to columns in Pandas DF


I have a dataframe that looks like this

id human_id
1 ('apples', '2022-12-04', 'a5ted')
2 ('bananas', '2012-2-14')
3 ('2012-2-14', 'reda21', 'ss')
.. ..

I would like a "pythonic" way to have such output

id human_id col1 col2 col3
1 ('apples', '2022-12-04', 'a5ted') apples 2022-12-04 a5ted
2 ('bananas', '2012-2-14') bananas 2022-12-04 np.NaN
3 ('2012-2-14', 'reda21', 'ss') 2012-2-14 reda21 ss
import pandas as pd

df['a'], df['b'], df['c'] = df.human_id.str

The code I have tried give me error:

ValueError: not enough values to unpack (expected 2, got 1) Python

How can I split the values in tuple to be in columns?

Thank you.


Solution

  • You can do it this way. It will just put None in places where it couldn't find the values. You can then append the df1 to df.

    d = {'id': [1,2,3], 
         'human_id': ["('apples', '2022-12-04', 'a5ted')", 
                      "('bananas', '2012-2-14')",
                      "('2012-2-14', 'reda21', 'ss')"
                     ]}
    
    df = pd.DataFrame(data=d)
    
    list_human_id = tuple(list(df['human_id']))
    
    newList = []
    for val in listh:
        newList.append(eval(val))
    
    df1 = pd.DataFrame(newList, columns=['col1', 'col2', 'col3'])
    
    print(df1)
    
    Output
    
    
            col1        col2   col3
    0     apples  2022-12-04  a5ted
    1    bananas   2012-2-14   None
    2  2012-2-14      reda21     ss