Search code examples
pythonpandascsvpyyaml

CSV file dump to yaml file in python


I'm trying to dump a .csv file into a .yml file and have succedeed. Only thing is that the syntax in the .yml file is not how I want it.

My .csv file:

NAME,KEYWORDS
Adam,Football Hockey

Where I read the .csv file and dump it into a .yml file:

import csv
import pandas
import yaml

""" Reading whole csv file with panda library """
df = pandas.read_csv('keywords.csv')


""" Dump DataFrame into getData.yml as yaml code """
with open('getData.yml', 'w') as outfile:
    yaml.dump(
        df.to_dict(orient='records'),
        outfile,
        sort_keys=False,
        width=72, 
        indent=4
    )

How the .yml output looks:

-   NAME: Adam
    KEYWORDS: Football Hockey

How I want it to look:

-   NAME: Adam
    KEYWORDS: Football, Hockey

I want to have a comma between Football and Hockey. But if I put that in the .csv file it will all look weird because everything is separated by comma from the first place. How can i do this?


Solution

  • You have 2 options for that:

    In a CSV file, if a comma is within quotes, then it won't be considered as a delimiter during parsing. This way, your CSV file would looks as follows:

    NAME,KEYWORDS
    Adam,"Football, Hockey"
    

    Alternatively, you can process the KEYWORDS column after reading it. This would add the following to your code:

    df = pandas.read_csv('keywords.csv')
    df["KEYWORDS"] = df["KEYWORDS"].apply(lambda x: ", ".join(x.split()))