Search code examples
c++parsingyamlyaml-cpp

Yaml-cpp parsing doesn't work space is missing after colon


I have encountered problem in yaml-cpp parser. When I try to load following definition:

DsUniversity:
  university_typ: {type: enum, values:[Fachhochschule, Universitat, Berufsakademie]}
  students_at_university: {type: string(50)}

I'm getting following error:

Error: yaml-cpp: error at line 2, column 39: end of map flow not found

I tried to verify yaml validity on http://yaml-online-parser.appspot.com/ and http://yamllint.com/ and both services reports yaml as valid.

Problem is caused by missing space after "values:" definition. When yaml is updated to following format:

DsUniversity:
  university_typ: {type: enum, values: [Fachhochschule, Universitat, Berufsakademie]}
  students_at_university: {type: string(50)}

everything works as expected.

Is there any way how to configure/update/fix yaml-cpp parser to proceed also yamls with missing space after colon?

Added: It seems that problem is caused by requirement for empty char as separator. When I simplified testing snippet to

DsUniversity:[Fachhochschule, Universitat, Berufsakademie]

yaml-cpp parser reads it as one scalar value "DsUniversity:[Fachhochschule, Universitat, Berufsakademie]". When empty char is added after colon, yaml-cpp correctly loads element with sequence.


Solution

  • yaml-cpp is correct here, and those online validators are incorrect. From the YAML 1.2 spec:

    7.4.2. Flow Mappings

    Normally, YAML insists the “:” mapping value indicator be separated from the value by white space. A benefit of this restriction is that the “:” character can be used inside plain scalars, as long as it is not followed by white space. This allows for unquoted URLs and timestamps. It is also a potential source for confusion as “a:1” is a plain scalar and not a key: value pair.

    ...

    To ensure JSON compatibility, if a key inside a flow mapping is JSON-like, YAML allows the following value to be specified adjacent to the “:”. This causes no ambiguity, as all JSON-like keys are surrounded by indicators. However, as this greatly reduces readability, YAML processors should separate the value from the “:” on output, even in this case.

    In your example, you're in a flow mapping (meaning a map surrounded by {}), but your key is not JSON-like: you just have a plain scalar (values is unquoted). To be JSON-like, the key needs to be either single- or double-quoted, or it can be a nested flow sequence or map itself.

    In your simplified example,

    DsUniversity:[Fachhochschule, Universitat, Berufsakademie]
    

    both yaml-cpp and the online validators parse this correctly as a single scalar - in order to be a map, as you intend, you're required a space after the :.

    Why does YAML require that space?

    In the simple plain scalar case:

    a:b
    

    could be ambiguous: it could be read as either a scalar a:b, or a map {a: b}. YAML chooses to read this as a scalar so that URLs can be easily embedded in YAML without quoting:

    http://stackoverflow.com
    

    is a scalar (like you'd expect), not a map {http: //stackoverflow.com}!

    In a flow context, there's one case where this isn't ambiguous: when the key is quoted, e.g.:

    {"a":b}
    

    This is called JSON-like because it's similar to JSON, which requires quotes around all scalars. In this case, YAML knows that the key ends at the end-quote, and so it can be sure that the value starts immediately.

    This behavior is explicitly allowed because JSON itself allows things like

    {"a":"b"}
    

    Since YAML 1.2 is a strict superset of JSON, this must be legal in YAML.