Search code examples
pythonpandasdataframelistdictionary

Converting a list to pandas dataframe where list contains dictionary


I wanted to convert a list to pandas dataframe, where the first element of the list is a dictionary.

I have below code

import pandas as pd
import numpy as np
pd.DataFrame([{'aa' : 10}, np.nan])

However this fails with below message

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.11/site-packages/pandas/core/frame.py", line 782, in __init__
    arrays, columns, index = nested_data_to_arrays(
                             ^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pandas/core/internals/construction.py", line 498, in nested_data_to_arrays
    arrays, columns = to_arrays(data, columns, dtype=dtype)
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pandas/core/internals/construction.py", line 832, in to_arrays
    arr, columns = _list_of_dict_to_arrays(data, columns)
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/lib/python3.11/site-packages/pandas/core/internals/construction.py", line 912, in _list_of_dict_to_arrays
    pre_cols = lib.fast_unique_multiple_list_gen(gen, sort=sort)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "pandas/_libs/lib.pyx", line 374, in pandas._libs.lib.fast_unique_multiple_list_gen
  File "/usr/local/lib/python3.11/site-packages/pandas/core/internals/construction.py", line 910, in <genexpr>
    gen = (list(x.keys()) for x in data)
                ^^^^^^
AttributeError: 'float' object has no attribute 'keys'

Could you please help how to resolve this issue?


Solution

  • Enclose your list into np.array:

    pd.DataFrame(np.array([{'aa' : 10}, np.nan]))
    

                0
    0  {'aa': 10}
    1         NaN
    

    Though you list is quite small, here's timings comparison just for the case:

    In [777]: %timeit pd.DataFrame(np.array([{'aa' : 10}, np.nan]))
    26.6 µs ± 220 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
    
    In [778]: %timeit pd.Series([{'aa' : 10}, np.nan]).to_frame()
    49.6 µs ± 911 ns per loop (mean ± std. dev. of 7 runs, 10,000 loops each)