Search code examples
pythonyamlruamel.yaml

Preserving Anchors for Numeric Value 0 in ruamel.yaml


I'm encountering an issue with preserving YAML anchors for numeric value, particularly with the number 0 all the other numeric value works fine, when using ruamel.yaml. Here's what's happening:

Context: I'm using ruamel.yaml to parse and manipulate YAML files in Python. I need to keep anchors for numeric values intact, but here's the problem:

from ruamel.yaml import YAML, ScalarInt, PlainScalarString

# Custom loader to attempt to preserve anchors for numeric values
class CustomLoader(YAML):
    def __init__(self):
        super().__init__(typ='rt')
        self.preserve_quotes = True
        self.explicit_start = True
        self.default_flow_style = False

    def construct_yaml_int(self, node):
        value = super().construct_yaml_int(node)
        if node.anchor:
            # Preserve the anchor for numeric values
            if value == 0:
                return PlainScalarString("0", anchor=node.anchor.value)
            else:
                return ScalarInt(value, anchor=node.anchor.value)
        return value

yaml = CustomLoader()

# Load the YAML file
with open('current.yaml', 'r') as current_file:
    current_data = yaml.load(current_file)
    print("Debug: current_data after load:", current_data)
    for key, value in current_data.items():
        print(f"Debug: Key '{key}', value type: {type(value)}, has anchor: {hasattr(value, 'anchor')}, anchor value: {getattr(value, 'anchor', None)}")

current.yaml:

person: &person_age 0
person: &person_age 1 # this works

Expected Behavior: The anchor &person_age should be preserved for the person key with the value 0.

Actual Behavior: The anchor is not preserved; hasattr(value, 'anchor') returns False, and the value type is <class 'int'> rather than ScalarInt or PlainScalarString with an anchor.

What I've tried: I've tried to override construct_yaml_int in a custom loader to manually preserve anchors for integers, but it doesn't seem to work. I've ensured that ruamel.yaml is configured with typ='rt' for round-trip preservation. I've experimented with quoting the 0 in the YAML file (person: &person_age "0"), which does preserve the anchor, but this isn't a feasible solution for my use case where users might not quote their numeric values.

Question: How can I ensure that anchors are preserved for numeric value 0, when using ruamel.yaml? Is there a way to force ruamel.yaml to handle anchors for numbers without needing them to be quoted in the source YAML?

Any insights or alternative approaches would be greatly appreciated.

Version- [Python:3.12.5, ruamel.yaml:0.18.6]


Solution

  • On loading your current.yaml you should get an error because YAML requires unique keys in a mapping. After fixing that you should get a warning that you redefine the anchor person_age.

    But that is not the cause for 0 to lose its anchor. The cause for that is that constructor for integers has quite a bit of special code for handling integers starting with the character '0' (and different code for handling octals in YAML 1.1 and 1.2), That code still had a shortcut for the string of characters consisting of only the string "0", thereby never reaching code that properly handled anchored integer scalars (a later addition, not tested with 0).

    This will be solved in the next release of ruamel.yaml, but in the mean time you should be able to do somehting like:

    import sys
    import ruamel.yaml
    
    yaml_str = """\
    person1: &person_age1 0
    person2: &person_age2 1 # this works
    """
    
    yaml = ruamel.yaml.YAML()
    
    if ruamel.yaml.version_info < (0, 18, 7):
    
        class MyConstructor(ruamel.yaml.constructor.RoundTripConstructor):
            def construct_yaml_int(self, node):
                width = None
                value_su = self.construct_scalar(node)
                try:
                    sx = value_su.rstrip('_')
                    underscore = [len(sx) - sx.rindex('_') - 1, False, False]
                except ValueError:
                    underscore = None
                except IndexError:
                    underscore = None
                value_s = value_su.replace('_', "")
                sign = +1
                if value_s[0] == '-':
                    sign = -1
                if value_s[0] in '+-':
                    value_s = value_s[1:]
                if value_s.startswith('0b'):
                    if self.resolver.processing_version > (1, 1) and value_s[2] == '0':
    
                        width = len(value_s[2:])
                    if underscore is not None:
                        underscore[1] = value_su[2] == '_'
                        underscore[2] = len(value_su[2:]) > 1 and value_su[-1] == '_'
                    return BinaryInt(
                        sign * int(value_s[2:], 2),
                        width=width,
                        underscore=underscore,
                        anchor=node.anchor,
                    )
                elif value_s.startswith('0x'):
                    # default to lower-case if no a-fA-F in string
                    if self.resolver.processing_version > (1, 1) and value_s[2] == '0':
                        width = len(value_s[2:])
                    hex_fun = HexInt
                    for ch in value_s[2:]:
                        if ch in 'ABCDEF':  # first non-digit is capital
                            hex_fun = HexCapsInt
                            break
                        if ch in 'abcdef':
                            break
                    if underscore is not None:
                        underscore[1] = value_su[2] == '_'
                        underscore[2] = len(value_su[2:]) > 1 and value_su[-1] == '_'
                    return hex_fun(
                        sign * int(value_s[2:], 16),
                        width=width,
                        underscore=underscore,
                        anchor=node.anchor,
                    )
                elif value_s.startswith('0o'):
                    if self.resolver.processing_version > (1, 1) and value_s[2] == '0':
                        width = len(value_s[2:])
                    if underscore is not None:
                        underscore[1] = value_su[2] == '_'
                        underscore[2] = len(value_su[2:]) > 1 and value_su[-1] == '_'
                    return OctalInt(
                        sign * int(value_s[2:], 8),
                        width=width,
                        underscore=underscore,
                        anchor=node.anchor,
                    )
                elif self.resolver.processing_version != (1, 2) and value_s[0] == '0':
                    return OctalInt(
                        sign * int(value_s, 8), width=width, underscore=underscore, anchor=node.anchor,
                    )
                elif self.resolver.processing_version != (1, 2) and ':' in value_s:
                    digits = [int(part) for part in value_s.split(':')]
                    digits.reverse()
                    base = 1
                    value = 0
                    for digit in digits:
                        value += digit * base
                        base *= 60
                    return sign * value
                elif self.resolver.processing_version > (1, 1) and value_s[0] == '0':
                    # not an octal, an integer with leading zero(s)
                    if underscore is not None:
                        # cannot have a leading underscore
                        underscore[2] = len(value_su) > 1 and value_su[-1] == '_'
                    return ruamel.yaml.scalarint.ScalarInt(sign * int(value_s), width=len(value_s), underscore=underscore, anchor=node.anchor)
                elif underscore:
                    # cannot have a leading underscore
                    underscore[2] = len(value_su) > 1 and value_su[-1] == '_'
                    return ruamel.yaml.scalarint.ScalarInt(
                        sign * int(value_s), width=None, underscore=underscore, anchor=node.anchor,
                    )
                elif node.anchor:
                    return ruamel.yaml.scalarint.ScalarInt(sign * int(value_s), width=None, anchor=node.anchor)
                else:
                    return sign * int(value_s)
    
    
        MyConstructor.add_default_constructor('int')
    
        yaml.Constructor = MyConstructor
    
    
    
    data = yaml.load(yaml_str)
    yaml.dump(data, sys.stdout)
    

    which gives:

    person1: &person_age1 0
    person2: &person_age2 1 # this works
    

    From looking at the code I also noticed that anchored sexagesimals (references to which were dropped in the 1.2 spec) lose their anchor, but sexagesimals are not preserved anyway.