Search code examples
pythonpandasdataframeyaml

Pandas dataframe to yaml preserving hierarchy


I have a pandas dataframe that looks like this:

screen

import pandas as pd
df=pd.DataFrame({'title':['dashboard1','dashboard1','dashboard2']
               ,'srv':['ods','ods','ods']
               ,'db':['db1','db1','db2']
               ,'sch':['S1','S2','S4']
               ,'nm':['Name1','Name2','Name3']})
df

I want to convert this dataframe to the following YAMLs:

title: dashboard1
dataSources:
  dataObjects:
    -srv:ods
     db:db1
     sch:S1
     nm:Name1
    -srv:ods
     db:db1
     sch:S2
     nm:Name2

title: dashboard2
dataSources:
  dataObjects:
    -srv:ods
     db:db2
     sch:S4
     nm:Name3    

I tried using

yaml.dump({'dataObjects': df.groupby('title')['nm'].apply(list).to_dict()},allow_unicode=True)

to no avail.

Could someone be so kind as to give me some hints which method to look at to perform this transformation?


Solution

  • With groupby/to_dict("records") and dump_all :

    print(
        yaml.dump_all(
            [
                {"title": t, "dataSources": {"dataObjects": ele}}
                for t, ele in df.groupby("title", sort=False)
                .apply(lambda g: g.to_dict("records"), include_groups=False)
                .to_dict()
                .items()
            ],
            sort_keys=False,
        ).replace("---", "")
    )
    

    Output :

    title: dashboard1
    dataSources:
      dataObjects:
      - srv: ods
        db: db1
        sch: S1
        nm: Name1
      - srv: ods
        db: db1
        sch: S2
        nm: Name2
    
    title: dashboard2
    dataSources:
      dataObjects:
      - srv: ods
        db: db2
        sch: S4
        nm: Name3