Python Pandas - how to read in data from list (data) and columns (separate list)

I'm running into a situation I don't know what to do:

The data is a list, no index. Sample data:

data = [
 {'fields': ['2024-10-07T21:22:01', 'USER-A', 21,  0,  0, 21]},
 {'fields': ['2024-10-07T21:18:28', 'USER-B', 20, 20,  0,  0, 0, 45]}
]

The column header is in another:

cols = ['Created On', 'Created By', 'Transaction Count (ALL)',
        'X Pending', 'X Cancelled (X)', 'X Completed (Y)']

I have tried using pandas.DataFrame as well as json_normalize, I either get a single column table with each value as a row, or I got all values as a column, and when I try with using "fields", it tells me "list indices must be integers or slices, not str" which I don't understand why I get this... what is the best way to have these info into a dataframe please?

(the number of data elements and number of column headers may not be consistent just for example sake, the real data has things aligned)

Solution

You could combine two DataFrame constructors:

data = [{'fields': ['2024-10-07T21:22:01', 'USER-A', 21, 0, 0, 21]},
        {'fields': ['2024-10-07T21:18:28', 'USER-B', 20, 20, 0, 0, 0, 45]},
       ]

out = pd.DataFrame(pd.DataFrame(data)['fields'].tolist())

Output:

                     0       1   2   3  4   5    6     7
0  2024-10-07T21:22:01  USER-A  21   0  0  21  NaN   NaN
1  2024-10-07T21:18:28  USER-B  20  20  0   0  0.0  45.0

If you also have a list of columns cols, you could truncate the columns:

cols = ['Created On', 'Created By', 'Transaction Count (ALL)',
        'X Pending', 'X Cancelled (X)', 'X Completed (Y)']

out = pd.DataFrame(pd.DataFrame(data)['fields'].str[:len(cols)].tolist(),
                   columns=cols)

Output:

            Created On Created By  Transaction Count (ALL)  X Pending  X Cancelled (X)  X Completed (Y)
0  2024-10-07T21:22:01     USER-A                       21          0                0               21
1  2024-10-07T21:18:28     USER-B                       20         20                0                0

Or rename to keep the extra columns:

out = (pd.DataFrame(pd.DataFrame(data)['fields'].tolist())
         .rename(columns=dict(enumerate(cols)))
       )

Output:

            Created On Created By  Transaction Count (ALL)  X Pending  X Cancelled (X)  X Completed (Y)    6     7
0  2024-10-07T21:22:01     USER-A                       21          0                0               21  NaN   NaN
1  2024-10-07T21:18:28     USER-B                       20         20                0                0  0.0  45.0

But, honestly, better pre-process in pure python, this will be more efficient/explicit:

# truncation
out = pd.DataFrame((dict(zip(cols, d['fields'])) for d in data))

# alternative truncation
out = pd.DataFrame([d['fields'][:len(cols)] for d in data], columns=cols)

# renaming
out = (pd.DataFrame([d['fields'] for d in data])
         .rename(columns=dict(enumerate(cols)))
      )