Search code examples
pythonyamlpyyaml

Pyyaml dump does not produce anchors for the same objects


I was experimenting a bit with PyYaml and I wanted to have a reference to a value appearing previously in the yaml. To give an example:

import yaml
a=25
dict_to_dump={'a':a,'b':a}
yaml.dump(dict_to_dump)

from what I understood from the specifications pyyaml should be adding an anchor to each object that has already been encountered. In my case, I would expect to have in the yaml file:

a:&id 25
b:*id

as the objects passed are exactly the same but instead, I find:

a:25
b:25

how can I obtain the desired behaviour?


Solution

  • First of all your expectation is incorrect. What you could expect is

    a: &id 25
    b: *id
    

    with a space after the value indicator (:).

    You also will need to do yaml.dump(dict_to_dump, sys.stdout) to get any output from your program, and what you indicate is not what you get (it again is missing spaces after the value indicator).


    You normally only get an alias if you have two objects a and b with the same value for id(a) and id(b). Simple objects like integers and strings (that are reused from a pool) have the same id() even if assigned in different places in the source. Variable structures like a dict or list, or instances of Python classes do not usually have the same id().

    PyYAML does know about this and handles some types of objects different even if the id() is the same.

    import sys
    import yaml
    import datetime
    
    a = 25
    b = 25
    c = 'some string specified twice in the source'
    d = 'some string specified twice in the source'
    e = datetime.date(2023, 1, 11)
    f = datetime.date(2023, 1, 11)
    
    print('a-b', id(a) == id(b))
    print('c-d', id(c) == id(d))
    print('e-f', id(e) == id(f))
    print('=====')
    
    dict_to_dump = dict(e=e, x=e, f=f)
    yaml.dump(dict_to_dump, sys.stdout)
    

    which gives:

    a-b True
    c-d True
    e-f False
    =====
    e: &id001 2023-01-11
    f: 2023-01-11
    x: *id001
    

    If you want to get the expected output, you have to make a Python class Int that behaves like an integer. And then when you do a = Int(25) you will get your anchor and alias.

    This is what my library ruamel.yaml does, when loading in the default round-trip mode, it also preserves the actual anchor/alias used:

    import sys
    import ruamel.yaml
    
    yaml_str = """\
    a: &my_special_id 25
    b: *my_special_id
    """
    
    yaml = ruamel.yaml.YAML()
    data = yaml.load(yaml_str)
    print(f'{data["a"] * 4  =}')
    print(f'{data["b"] + 75 =}')
    print('=====')
    yaml.dump(data, sys.stdout)
    

    which gives:

    data["a"] * 4  =100
    data["b"] + 75 =100
    =====
    a: &my_special_id 25
    b: *my_special_id
    

    To create data from scratch is also possible

    import sys
    import ruamel.yaml
    
    Int = ruamel.yaml.scalarint.ScalarInt
    
    a = Int(25, anchor='id')
    data = dict(a=a, b=a)
    
    yaml = ruamel.yaml.YAML()
    yaml.dump(data, sys.stdout)
    

    which gives what you expected in the first place:

    a: &id 25
    b: *id