I have below csv format. I want it to convert some nested dict.
name,columns,tests
ABC_ESTIMATE_REFINED,cntquota,dbt_expectations.expect_column_to_exist
ABC_ESTIMATE_REFINED,cntquota,not_null
ABC_ESTIMATE_REFINED,is_purged,dbt_expectations.expect_column_to_exist
ABC_ESTIMATE_REFINED,is_purged,not_null
Expected Output
{
"name": "ABC_ESTIMATE_REFINED",
"columns": [
{
"name": "cntquota",
"tests": [
"dbt_expectations.expect_column_to_exist",
"not_null"
]
},
{
"name": "is_purged",
"tests": [
"dbt_expectations.expect_column_to_exist",
"not_null"
]
}
]
}
my attempt is below , but not reaching even close to it.
df=pd.read_csv('data.csv')
print(df)
nested_dict = df.groupby(['name','columns']).apply(lambda x: x[['tests']].to_dict(orient='records')).to_dict()
print(nested_dict)
IIUC, you can use nested groupby
calls:
out = [{'name': k1, 'columns': [{'name': k2, 'tests': g2['tests'].tolist()}
for k2, g2 in g1.groupby('columns')]}
for k1, g1 in df.groupby('name')]
Since the processing occurs by pairs or columns, you could also imagine a recursive approach:
def group(df, keys):
if len(keys) > 1:
key1, key2 = keys[:2]
return [{key1: k, key2: group(g, keys[1:])}
for k, g in df.groupby(key1)]
else:
return df[keys[0]].tolist()
out = group(df, ['name', 'columns', 'tests'])
Output:
[{'name': 'ABC_ESTIMATE_REFINED',
'columns': [{'name': 'cntquota',
'tests': ['dbt_expectations.expect_column_to_exist', 'not_null']},
{'name': 'is_purged',
'tests': ['dbt_expectations.expect_column_to_exist', 'not_null']}],
}]
Demo of the recursive approach with a different order of the keys:
group(df, ['name', 'tests', 'columns'])
[{'name': 'ABC_ESTIMATE_REFINED',
'tests': [{'tests': 'dbt_expectations.expect_column_to_exist',
'columns': ['cntquota', 'is_purged']},
{'tests': 'not_null', 'columns': ['cntquota', 'is_purged']}],
}]