Search code examples
python-2.7pyyaml

Reading multiple files in a directory with pyyaml


I'm trying to read all yaml files in a directory, but I am having trouble. First, because I am using Python 2.7 (and I cannot change to 3) and all of my files are utf-8 (and I also need them to keep this way).

import os
import yaml
import codecs


def yaml_reader(filepath):
    with codecs.open(filepath, "r", encoding='utf-8') as file_descriptor:
        data = yaml.load_all(file_descriptor)
        return data

def yaml_dump(filepath, data):
    with open(filepath, 'w') as file_descriptor:
        yaml.dump(data, file_descriptor)

if __name__ == "__main__":
    filepath = os.listdir(os.getcwd())
    data = yaml_reader(filepath)
    print data

When I run this code, python gives me the message:

TypeError: coercing to Unicode: need string or buffer, list found.

I want this program to show the content of the files. Can anyone help me?


Solution

  • There are multiple problems with your code, apart from that it is invalide Python, in the way you formatted this.

    def yaml_reader(filepath):
        with codecs.open(filepath, "r", encoding='utf-8') as file_descriptor:
            data = yaml.load_all(file_descriptor)
            return data
    

    however it is not necessary to do the decoding, PyYAML is perfectly capable of processing UTF-8:

    def yaml_reader(filepath):
        with open(filepath, "rb") as file_descriptor:
            data = yaml.load_all(file_descriptor)
            return data
    

    I hope you realise your trying to load multiple documents and always get a list as a result in data even if your file contains one document.

    Then the line:

           filepath = os.listdir(os.getcwd())
    

    gives you a list of files, so you need to do:

           filepath = os.listdir(os.getcwd())[0]
    

    or decide in some other way, which of the files you want to open. If you want to combine all files (assuming they are YAML) in one big YAML file, you need to do:

    if __name__ == "__main__":
        data = []
        for filepath in os.listdir(os.getcwd()):
            data.extend(yaml_reader(filepath))
        print data
    

    And your dump routine would need to change to:

    def yaml_dump(filepath, data):
        with open(filepath, 'wb') as file_descriptor:
            yaml.dump(data, file_descriptor, allow_unicode=True, encoding='utf-8')
    

    However this all brings you to the biggest problem: that you are using PyYAML, that will mangle your YAML, dropping flow-style, comment, anchor names, special int/float, quotes around scalars etc. Apart from that PyYAML has not been updated to support YAML 1.2 documents (which has been the standard since 2009). I recommend you switch to using ruamel.yaml (disclaimer: I am the author of that package), which supports YAML 1.2 and leaves comments etc in place.

    And even if you are bound to use Python 2, you should use the Python 3 like syntax e.g. for print that you can get with from __future__ imports.

    So I recommend you do:

    pip install pathlib2 ruamel.yaml
    

    and then use:

    from __future__ import absolute_import, unicode_literals, print_function
    
    from pathlib import Path
    from ruamel.yaml import YAML
    
    if __name__ == "__main__":
        data = []
        yaml = YAML()
        yaml.preserve_quotes = True
        for filepath in Path('.').glob('*.yaml'):
            data.extend(yaml.load_all(filepath))
        print(data)
        yaml.dump(data, Path('your_output.yaml'))