Search code examples
pythonyamlpyyaml

Parse YAML and assume a certain path is always a string


I am using the YAML parser from http://pyyaml.org and I want it to always interpret certain fields as string, but I can't figure out how add_path_resolver() works.

For example: The parser assumes that "version" is a float:

network:
- name: apple
- name: orange
version: 2.3
site: banana

Some files have "version: 2" (which is interpreted as an int) or "version: 2.3 alpha" (which is interpreted as a str).

I want them to always be interpreted as a str.

It seems that yaml.add_path_resolver() should let me specify, "When you see version:, always interpret it as a str) but it is not documented very well. My best guess is:

yaml.add_path_resolver(u'!root', ['version'], kind=str)

But that doesn't work.

Suggestions on how to get my version field to always be a string?

P.S. Here are some examples of different "version" strings and how they are interpreted:

(Pdb) import yaml
(Pdb) import pprint
(Pdb) pprint.pprint(yaml.load("---\nnetwork:\n- name: apple\n- name: orange\nversion: 2\nsite: banana"))
{'network': [{'name': 'apple'}, {'name': 'orange'}],
 'site': 'banana',
 'version': 2}
(Pdb) pprint.pprint(yaml.load("---\nnetwork:\n- name: apple\n- name: orange\nversion: 2.3\nsite: banana"))
{'network': [{'name': 'apple'}, {'name': 'orange'}],
 'site': 'banana',
 'version': 2.2999999999999998}
(Pdb) pprint.pprint(yaml.load("---\nnetwork:\n- name: apple\n- name: orange\nversion: 2.3 alpha\nsite: banana"))
{'network': [{'name': 'apple'}, {'name': 'orange'}],
 'site': 'banana',
 'version': '2.3 alpha'}

Solution

  • By far the easiest solution for this is not use the basic .load() (which is unsafe anyway), but use it with Loader=BaseLoader, which loads every scalar as a string:

    import yaml
    
    yaml_str = """\
    network:
    - name: apple
    - name: orange
    version: 2.3
    old: 2
    site: banana
    """
    
    data = yaml.load(yaml_str, Loader=yaml.BaseLoader)
    print(data)
    

    gives:

    {'network': [{'name': 'apple'}, {'name': 'orange'}], 'version': '2.3', 'old': '2', 'site': 'banana'}