Search code examples
python-2.7pyyaml

PyYAML shows "ScannerError: mapping values are not allowed here" in my unittest


I am trying to test a number of Python 2.7 classes using unittest.

Here is the exception:

ScannerError: mapping values are not allowed here
    in "<unicode string>", line 3, column 32:
            ... file1_with_path: '../../testdata/concat1.csv'

Here is the example the error message relates to:

class TestConcatTransform(unittest.TestCase):

def setUp(self):
    filename1 = os.path.dirname(os.path.realpath(__file__)) + '/../../testdata/concat1.pkl'
    self.df1 = pd.read_pickle(filename1)
    filename2 = os.path.dirname(os.path.realpath(__file__)) + '/../../testdata/concat2.pkl'
    self.df2 = pd.read_pickle(filename2)

    self.yamlconfig =  u'''
        --- !ConcatTransform
        file1_with_path: '../../testdata/concat1.csv'
        file2_with_path: '../../testdata/concat2.csv'
        skip_header_lines: [0]
        duplicates: ['%allcolumns']
        outtype: 'dataframe'
        client: 'testdata'
        addcolumn: []
    '''
    self.testconcat = yaml.load(self.yamlconfig)

What is the the problem?

Something not clear to me is that the directory structure I have is:

app
app/etl
app/tests

The ConcatTransform is in app/etl/concattransform.py and TestConcatTransform is in app/tests. I import ConcatTransform into the TestConcatTransform unittest with this import:

from app.etl import concattransform

How does PyYAML associate that class with the one defined in yamlconfig?


Solution

  • A YAML document can start with a document start marker ---, but that has to be at the beginning of a line, and yours is indented eight positions on the second line of the input. That causes the --- to be interpreted as the beginning of a multi-line plain (i.e. non-quoted) scalar, and within such a scalar you cannot have a : (colon + space). You can only have : in quoted scalars. And if your document does not have a mapping or sequence at the root level, as yours doesn't, the whole document can only consists of a single scalar.

    If you want to keep your sources nicely indented like you have now, I recommend you use dedent from textwrap.

    The following runs without error:

    import ruamel.yaml
    from textwrap import dedent
    
    yaml_config = dedent(u'''\
            --- !ConcatTransform
            file1_with_path: '../../testdata/concat1.csv'
            file2_with_path: '../../testdata/concat2.csv'
            skip_header_lines: [0]
            duplicates: ['%allcolumns']
            outtype: 'dataframe'
            client: 'testdata'
            addcolumn: []
    ''')
    
    yaml = ruamel.yaml.YAML()
    data = yaml.load(yaml_config)
    

    You should get into the habit to put the backslash (\) at the end of your first triple-quotes, so your YAML document. If you do that, your error would have actually indicated line 2 because the document doesn't start with an empty line anymore.


    During loading the YAML parser encouncters the tag !ConcatTransform. A constructor for an object is probably registered with the PyYAML loader, associating that tag with the using PyYAML's add_constructor, during the import.

    Unfortunately they registered their constructor with the default, non-safe, loader, which is not necessary, they could have registered with the SafeLoader, and thereby not force users to risk problems with non-controlled input.