Search code examples
pythonparsingyamlpyyaml

python yaml package parsing new line when not needed


I'm searching for days now, trying to find out why my yaml parser (using PyYaml) is not saving back the YAML, as it was at the original state.

The original line in YAML is:

healthcheck:
  test: ["CMD-SHELL", "[ x\"`curl -k --silent -w '%{http_code}' https://localhost:4433 | grep 401`\" = x\"\" ] && exit 1 || exit 0"]       
  interval: 30s

But the new line (just loading the file and saving it back again):

    healthcheck:
      interval: 30s
      test:
      - CMD-SHELL
      - '[ x"`curl -k --silent -w ''%{http_code}'' https://localhost:4433 | grep 401`"
      = x"" ] && exit 1 || exit 0'

There are two problems here: 1) the "test" value become a list instead of 1 line key value pair. 2) there are actually 3 new line here,

a) -CMD-SHELL 
b)- '[ x"`curl -k --silent -w ''%{http_code}'' https://localhost:4433 | grep 401`"
c)= x"" ] && exit 1 || exit 0'

so the other question is, why the third line was broken from the second line? (if I show white space, you will see that in the end of the second line it has LF and then starts the third line


Solution

  • I think you may have some misunderstandings about YAML syntax. This:

    test: ["this", "is", "a", "list"]
    

    Is exactly equivalent to this:

    test:
      - this
      - is
      - a
      - list
    

    And this:

    - "This is a string value"
    

    Is exactly equivalent to:

    - "This is a
      string value"
    

    If I drop your example into a file data.yml:

    $ cat data.yml
    healthcheck:
      test: ["CMD-SHELL", "[ x\"`curl -k --silent -w '%{http_code}' https://localhost:4433 | grep 401`\" = x\"\" ] && exit 1 || exit 0"]
      interval: 30s
    

    And then parse it with PyYAML:

    >>> import yaml
    >>> with open('data.yml') as fd:
    ...   data = yaml.load(fd)
    ... 
    

    I get the following Python data structure:

    >>> pprint.pprint(data)
    {'healthcheck': {'interval': '30s',
                     'test': ['CMD-SHELL',
                              '[ x"`curl -k --silent -w \'%{http_code}\' https://localhost:4433 | grep 401`" = x"" ] && exit 1 || exit 0']}}
    

    And if I dump that using PyYAML, I get:

    >>> print yaml.dump(data)
    healthcheck:
      interval: 30s
      test: [CMD-SHELL, '[ x"`curl -k --silent -w ''%{http_code}'' https://localhost:4433
          | grep 401`" = x"" ] && exit 1 || exit 0']
    

    ...which seems just fine. I can request the more verbose list syntax, in which case I get what you show in your example:

    >>> print yaml.dump(data, default_flow_style=False)
    healthcheck:
      interval: 30s
      test:
      - CMD-SHELL
      - '[ x"`curl -k --silent -w ''%{http_code}'' https://localhost:4433 | grep 401`"
        = x"" ] && exit 1 || exit 0'
    

    ...which will parse to exactly the same Python data structure as the original document. Other than "looking different", the actual data is identical.