Search code examples
jsonpyyaml

Emit Python embedded object as native JSON in YAML document


I'm importing webservice tests from Excel and serialising them as YAML.

But taking advantage of YAML being a superset of JSON I'd like the request part of the test to be valid JSON, i.e. to have delimeters, quotes and commas.

This will allow us to cut and paste requests between the automated test suite and manual test tools (e.g. Postman.)

So here's how I'd like a test to look (simplified):

- properties:
    METHOD: GET
    TYPE: ADDRESS
    Request URL: /addresses
    testCaseId: TC2
  request:
    {
        "unitTypeCode": "",
        "unitNumber": "15",
        "levelTypeCode": "L",
        "roadNumber1": "810",
        "roadName": "HAY",
        "roadTypeCode": "ST",
        "localityName": "PERTH",
        "postcode": "6000",
        "stateTerritoryCode": "WA"
    }

In Python, my request object has a dict attribute called fields which is the part of the object to be serialised as JSON. This is what I tried:

import yaml

def request_presenter(dumper, request):
    json_string = json.dumps(request.fields, indent=8)
    return dumper.represent_str(json_string)

yaml.add_representer(Request, request_presenter)

test = Test(...including embedded request object)
serialised_test = yaml.dump(test)

I'm getting:

- properties:
    METHOD: GET
    TYPE: ADDRESS
    Request URL: /addresses
    testCaseId: TC2
  request: "{
    \"unitTypeCode\": \"\",\n
    \"unitNumber\": \"15\",\n
    \"levelTypeCode": \"L\",\n
    \"roadNumber1\": \"810\",\n
    \"roadName\": \"HAY\",\n
    \"roadTypeCode\": \"ST\",\n
    \"localityName\": \"PERTH\",\n
    \"postcode\": \"6000\",\n
    \"stateTerritoryCode\": \"WA\"\n
  }"

...only worse because it's all on one line and has white space all over the place.

I tried using the | style for literal multi-line strings which helps with the line breaks and escaped quotes (it's more involved but this answer was helpful.) However, escaped or multiline, the result is still a string that will need to be parsed separately.

How can I stop PyYaml analysing the JSON block as a string and make it just accept a block of text as part of the emitted YAML? I'm guessing it's something to do with overriding the emitter but I could use some help. If possible I'd like to avoid post-processing the serialised test to achieve this.


Solution

  • Ok, so this was the solution I came up with. Generate the YAML with a placemarker ahead of time. The placemarker marks the place where the JSON should be inserted, and also defines the root-level indentation of the JSON block.

    import os
    import itertools
    import json
    
    
    def insert_json_in_yaml(pre_insert_yaml, key, obj_to_serialise):
        marker = '%s: null' % key
        marker_line = line_of_first_occurrence(pre_insert_yaml, marker)
        marker_indent = string_indent(marker_line)
        serialised = json.dumps(obj_to_serialise, indent=marker_indent + 4)
        key_with_json = '%s: %s' % (key, serialised)
        serialised_with_json = pre_insert_yaml.replace(marker, key_with_json)
        return serialised_with_json
    
    
    def line_of_first_occurrence(basestring, substring):
        """
        return line number of first occurrence of substring
        """
        lineno = lineno_of_first_occurrence(basestring, substring)
        return basestring.split(os.linesep)[lineno]
    
    
    def string_indent(s):
        """
        return indentation of a string (no of spaces before a nonspace)
        """
        spaces = ''.join(itertools.takewhile(lambda c: c == ' ', s))
        return len(spaces)
    
    
    def lineno_of_first_occurrence(basestring, substring):
        """
        return line number of first occurrence of substring
        """
        return basestring[:basestring.index(substring)].count(os.linesep)
    
    
    embedded_object = {
        "unitTypeCode": "",
        "unitNumber": "15",
        "levelTypeCode": "L",
        "roadNumber1": "810",
        "roadName": "HAY",
        "roadTypeCode": "ST",
        "localityName": "PERTH",
        "postcode": "6000",
        "stateTerritoryCode": "WA"
    }
    yaml_string = """
    ---
    
    - properties:
        METHOD: GET
        TYPE: ADDRESS
        Request URL: /addresses
        testCaseId: TC2
      request: null
      after_request: another value
    """
    
    >>> print(insert_json_in_yaml(yaml_string, 'request', embedded_object))
    - properties:
        METHOD: GET
        TYPE: ADDRESS
        Request URL: /addresses
        testCaseId: TC2
      request: {
        "unitTypeCode": "",
        "unitNumber": "15",
        "levelTypeCode": "L",
        "roadNumber1": "810",
        "roadName": "HAY",
        "roadTypeCode": "ST",
        "localityName": "PERTH",
        "postcode": "6000",
        "stateTerritoryCode": "WA"
      }
      after_request: another value