Given a csv file where some columns contain lists, sets or dictionaries like one with the structure below:
| user_id| items | methods | dict_col |
|--------|-------------------------|----------------|---------------|
| ID01 | [potato, apple, potato] | {card, cash} | {F: [AB, CD]} |
| ID02 | [carrots, papaya] | {bitcoin, card}| {F: [AB, CD]} |
Is there a way to ingest it in Python in tabular way where the type of the values in those columns in maintained?
If not, what is the best approach to convert them back to list, set, dictionaries?
The question stems from the fact that once one has a DataFrame with this structure and it is saved into a csv, when the csv is loaded back with pandas.read_csv(), the values inside those columns are no longer lists,set or dictionaries.
Below the code to recreate the scenario explained above.
import pandas as pd
# Create dummy example
df = pd.DataFrame({'user_id': ['ID01', 'ID02'], 'items': [['potato', 'apple', 'potato'],['carrots', 'papaya']],
'methods': [{'card', 'cash'}, {'bitcoin', 'card'}],
'dict_col': [{'F': ['AB', 'CD']}, {'F': ['AB', 'CD']}]})
df[['user_id', 'items', 'methods', 'dict_col']]
type(df.iloc[0]['dict_col']) # Return a dict
df.to_csv('dummy_table.csv', index = False)
# Reload the table
df_loaded = pd.read_csv('dummy_table.csv')
"""
Line below returns a str and not a dict as in the original dataframe. How we go back to the original datatypes
(e.g. list, dict, set)in a pythonic way?
"""
type(df_loaded.iloc[0]['dict_col'])
Attempt after Kyle J. comment on trying with cvs.DictReader
I tried with DictReader, but the objective was not met. However, I am not sure this is what the Kyle had in mind.
import csv
import pandas as pd
df = pd.DataFrame()
with open('dummy_table.csv', newline = '') as csvfile:
reader = csv.DictReader(csvfile)
for row in reader:
df = pd.concat([df, pd.DataFrame(row, index = [0])], axis = 0)
type(df.iloc[0]['dict_col']) # Still a str
if the standard csv module performs in the same fashion, specifically to solve your problem you should try
import ast
###your original code####
dict_value = ast.literal_eval(df_loaded.iloc[0]['dict_col'])
type(dict_value)