I am able to dump YAML code with long strings in folded form with this code:
import yaml
class folded_str(str): pass
def folded_str_representer(dumper, data):
return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='>')
yaml.add_representer(folded_str, folded_str_representer)
data = {
'foo': folded_str(('abcdefghi ' * 10) + 'end\n'),
}
print(yaml.dump(data))
The output for the above code is:
foo: >
abcdefghi abcdefghi abcdefghi abcdefghi abcdefghi abcdefghi abcdefghi abcdefghi
abcdefghi abcdefghi end
Is it possible to control the length after which the folds should occur? For example, if I want the lines to fold after 70 characters, then the output would look like this:
foo: >
abcdefghi abcdefghi abcdefghi abcdefghi abcdefghi abcdefghi abcdefghi
abcdefghi abcdefghi abcdefghi end
Is there a way to make PyYAML do this?
The easy way to control how long the lines that PyYAML puts out with
folding, is to provide the (global) line length with the parameter width
:
import sys
import yaml
class folded_str(str): pass
def folded_str_representer(dumper, data):
return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='>')
yaml.add_representer(folded_str, folded_str_representer)
data = {
'foo': folded_str(('abcdefghi ' * 10) + 'end\n'),
}
yaml.dump(data, sys.stdout, width=70)
which gives:
foo: >
abcdefghi abcdefghi abcdefghi abcdefghi abcdefghi abcdefghi abcdefghi
abcdefghi abcdefghi abcdefghi end
As you can see, I removed your call to print
. PyYAML has a streaming
interface and by not directly streaming to output, it needs to make an
in-memory interpretation of the output which is both unnecessarily slow and memory
in-efficient.
Of course this also affects any other lines that get dumped (long non-folded scalars, flow-style lists, deeply nested data-structures.
The non-easy way is not to call the represent_scalar
routine, and
adapt PyYAML's ScalarNode
(or create your own Node
type), that
then does output a newline in the appropriate position when emitting.
My ruamel.yaml
has this functionality built in, to allow such output to round-trip
with the fold position preserved (even thought the default output width is the same
as PyYAML's)
import sys
import ruamel.yaml
yaml_str = """\
[long, scalar]: "This is just a filler to show that the default width is 80 chars"
foo: >
abcdefghi abcdefghi abcdefghi abcdefghi abcdefghi abcdefghi abcdefghi
abcdefghi abcdefghi abcdefghi end
"""
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
which gives:
[long, scalar]: This is just a filler to show that the default width is 80 chars
foo: >
abcdefghi abcdefghi abcdefghi abcdefghi abcdefghi abcdefghi abcdefghi
abcdefghi abcdefghi abcdefghi end
Although you can create such a folded string from scratch, it is not trivial (there is
no API, and the internal representation might change). What I recommend is
just creating the folded string data and then loading it by defining your folded_str
differntly:
import sys
import ruamel.yaml
yaml = ruamel.yaml.YAML()
def folded_str(s, pos=70):
parts = []
r = ""
for part in s.split(' '):
if not r:
r = part
elif len(r) + len(part) >= pos:
parts.append(r + '\n')
r = part
else:
r += ' ' + part
parts.append(r)
return yaml.load(">\n" + "".join(parts))
data = {
'foo': folded_str(('abcdefghi ' * 10) + 'end\n'),
}
yaml.dump(data, sys.stdout)
which gives:
foo: >
abcdefghi abcdefghi abcdefghi abcdefghi abcdefghi abcdefghi abcdefghi
abcdefghi abcdefghi abcdefghi end