Search code examples
python-2.7encodingpyyaml

encoding in python and writing it to a YAML file in Python


I have a Unicode, which is read from a CSV file:

df.iloc[0,1]
Out[41]: u'EU-repr\xe6sentant udpeget'

In [42]: type(df_translated.iloc[0,1])
Out[42]: unicode

I would like to have it as EU-repræsentant udpeget. The final goal is to write this into a dictionary and then finally save that dict to a YAML file with PyYAML using safe_dump. However, I struggle with the encoding.


Solution

  • If you really need to use PyYAML you should provide the arguments encoding='utf-8' and allow_unicode=True to the safe_dump() routine.

    If you ever intend to upgrade to YAML 1.2 and use ruamel.yaml (disclaimer: I am the author of that package), those are the (much more sensible) defaults:

    import sys
    import ruamel.yaml
    
    yaml = ruamel.yaml.YAML()
    
    data = [u'EU-repr\xe6sentant udpeget']
    yaml.dump(data, sys.stdout)
    

    which gives:

    - EU-repræsentant udpeget