I have a YAML document like this:
steps:
- !<!entry>
id: Entry-1
actions: []
- !<!replybuttons>
id: ReplyButtons-langcheck
footer: ''
- !<!input>
id: Input-langcheck
var: Input-1
- !<!logic>
id: LangCheck-Logic
entries:
- condition: !<!equals>
var: Input-langcheck
isCaseSensitive: false
And I try to read it:
import yaml
yaml.safe_load(yaml_text)
But I have an error:
yaml.constructor.ConstructorError: could not determine a constructor for the tag '!entry'
How can I parse YAML with such tags?
This option also doesn't work.
def construct_entry(loader, node):
value = loader.construct_scalar(node)
return value
yaml.SafeLoader.add_constructor('!<!entry>', construct_entry)
result = yaml.safe_load(yaml_text)
If I try to use ruamel.yaml
I can read the YAML documet, but I still don't understand how I can know about tags in python data.
import sys
from ruamel.yaml import YAML
class Entry:
yaml_tag = '!<!entry>'
def __init__(self, value, style=None):
self.value = value
self.style = style
@classmethod
def to_yaml(cls, representer, node):
return representer.represent_scalar(cls.yaml_tag,
u'{.value}'.format(node), node.style)
@classmethod
def from_yaml(cls, constructor, node):
return cls(node.value, node.style)
yaml_text = """\
steps:
- !<!entry>
id: 1
action: 2
- !<!entry>
id: 2
action: 3
"""
yaml1 = YAML(typ='rt')
data1 = yaml1.load(yaml_text)
print(f'{data1=}')
yaml1.dump(data1, sys.stdout)
yaml2 = YAML(typ='rt')
yaml2.register_class(Entry)
data2 = yaml2.load(yaml_text)
print(f'{data2=}')
yaml1.dump(data2, sys.stdout)
The effect is exactly the same.
data1=ordereddict([('steps', [ordereddict([('id', 1), ('action', 2)]), ordereddict([('id', 2), ('action', 3)])])])
steps:
- !entry
id: 1
action: 2
- !entry
id: 2
action: 3
data2=ordereddict([('steps', [ordereddict([('id', 1), ('action', 2)]), ordereddict([('id', 2), ('action', 3)])])])
steps:
- !entry
id: 1
action: 2
- !entry
id: 2
action: 3
If you just need to inspect the tags and , the corresponding loaded
dict and list subclasses preserve
their tag in the .tag
attribute (this might change so pin the version of ruamel.yaml you use):
import sys
import ruamel.yaml
yaml_str = """\
steps:
- !<!entry>
id: Entry-1
actions: []
- !<!replybuttons>
id: ReplyButtons-langcheck
footer: ''
- !<!input>
id: Input-langcheck
var: Input-1
- !<!logic>
id: LangCheck-Logic
entries:
- condition: !<!equals>
var: Input-langcheck
isCaseSensitive: false
"""
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
print('id', data['steps'][1]['id'])
print('tag', data['steps'][1].tag.value)
which gives:
id ReplyButtons-langcheck
tag !replybuttons
That your first attempt didn't work lies in the fact that your tags are special because of the <>
, these
are verbatim tags, in this case necessary
to allow a tag starting with an exclamation mark. So when the YAML contains !<abc>
you register !abc
with add_constructor (and I think you can leave out the !) and when your YAML contains !<!abc>
you register !abc
.
The parser strips the <>
for these verbatim tags, that is why that printed tag
doesn't contain them after loading.
Writing this I noticed that the round-trip parser doesn't check if a tag needs
to be written verbatim. So if you dump the loaded data, you get non-verbatim tags,
which don't load the same way. So
if you need to update these files, then you should to get the classes registered (let me know
if that doesn't work out).
Recursively walking over the data structure and rewrite the tags to compensate for this bug
will not work as the <>
gets escaped while dumping.