Search code examples
pythonpandasmultiple-columnstolist

Filling a fixed number (x) of new columns with tolist() function which sometimes contains less items (< x)


I am using tolist() to split up a 8-item-list in 1 column ('modelGreeks') into 8 new columns in that same dataframe:

pd.DataFrame(df['modelGreeks'].tolist(), index=df.index)
df[['IV_model', 59, 'Price_model', 61, 62, 63, 64, 'undPrice']] = pd.DataFrame(df['modelGreeks'].tolist(), index=df.index)

This the list that I normally get in the column 'modelGreeks':

(0.2953686167703842, -1.9317880628477724e-14, 1.4648640549124297e-15, 0.0, 6.240571011994176e-13, 1.1840837166645831e-15, -1.4648640549124297e-15, 10.444000244140625)

9 out of 10 times this works perfectly. But sometimes the data that I retrieve via an API is not perfect/complete. Instead of the expected list with 8 items in column 'modelGreeks' it provides a 'None'-value in that field and I get the following error message on the code execution of the second code line (logically since it tries to fill 8 columns with just 1 value available:

ValueError: Columns must be same length as key

I am looking for a solution where the 8 new columns will be created and filled anyway, e.g. with 0 or NaN or None.

Hope somebody can help out. Thanks in advance for your efforts.

The following code works:

df1 = pd.DataFrame(columns=['IV_model', 59, 'Price_model', 61, 62, 63, 64, 'undPrice','modelGreeks'])
df1['modelGreeks'] = [[None, None, None, None, None, None, None, None], None, None, None, None]
df1[['IV_model', 59, 'Price_model', 61, 62, 63, 64, 'undPrice']] = df1['modelGreeks'].apply(pd.Series)

It returns:

   IV_model  59  Price_model  61  62  63  64  undPrice  modelGreeks
0  NaN       NaN NaN          NaN NaN NaN NaN NaN       [None, None, None, None, None, None, None, None]
1  NaN       NaN NaN          NaN NaN NaN NaN NaN       None
2  NaN       NaN NaN          NaN NaN NaN NaN NaN       None
3  NaN       NaN NaN          NaN NaN NaN NaN NaN       None
4  NaN       NaN NaN          NaN NaN NaN NaN NaN       None

And that is good. The only problem is that on certain moments the dataset which I receive via the API from Interactive Brokers will only provide a scalar None value in all the rows of the column modelGreeks. If I apply that to the test case then i get the error message again ("ValueError: Columns must be same length as key"):

df1 = pd.DataFrame(columns=['IV_model', 59, 'Price_model', 61, 62, 63, 64, 'undPrice','modelGreeks'])
df1['modelGreeks'] = [None, None, None, None, None]
df1[['IV_model', 59, 'Price_model', 61, 62, 63, 64, 'undPrice']] = df1['modelGreeks'].apply(pd.Series)

Traceback (most recent call last):
File "/Users/floris/PycharmProjects/ib_insync/test1.py", line 9, in <module>
df1[['IV_model', 59, 'Price_model', 61, 62, 63, 64, 'undPrice']] = df1['modelGreeks'].apply(pd.Series)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py", line 3367, in __setitem__
self._setitem_array(key, value)
File "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/pandas/core/frame.py", line 3389, in _setitem_array
raise ValueError('Columns must be same length as key')
ValueError: Columns must be same length as key

In this case I would like to see also only NaN values in the 8 columns.


Solution

  • Don't create a new DataFrame but convert the list column into a Series:

    df[['IV_model', 59, 'Price_model', 61, 62, 63, 64, 'undPrice']] = df['modelGreeks'].apply(pd.Series)
    

    Test:

    df = pd.DataFrame(columns=['IV_model', 59, 'Price_model', 61, 62, 63, 64, 'undPrice','modelGreeks'])
    df['modelGreeks'] = [[1,2,3,4,5,6,7,8], [1,2,None,4,5,6,7,8], [1,2,3,4,5,6,7], [None], None, [None,None,None,None,None]]
    df[['IV_model', 59, 'Price_model', 61, 62, 63, 64, 'undPrice']] = df['modelGreeks'].apply(pd.Series)
    

    Output:

       IV_model   59  Price_model  ...   64  undPrice                     modelGreeks
    0       1.0  2.0          3.0  ...  7.0       8.0        [1, 2, 3, 4, 5, 6, 7, 8]
    1       1.0  2.0          NaN  ...  7.0       8.0     [1, 2, None, 4, 5, 6, 7, 8]
    2       1.0  2.0          3.0  ...  7.0       NaN           [1, 2, 3, 4, 5, 6, 7]
    3       NaN  NaN          NaN  ...  NaN       NaN                          [None]
    4       NaN  NaN          NaN  ...  NaN       NaN                            None
    5       NaN  NaN          NaN  ...  NaN       NaN  [None, None, None, None, None]