Search code examples
pythonyamlpyyaml

Python opening/loading yaml file changes values (numbers ?) if they contain a colon : and less than 3 digits after the colon


Simple example.yml file

Base:
    StartTime: 645:0
    EndTimes: 645:023
    MidTimes: 645:02
    mac: 99:19:b9:fa:37:99
    MissionStartTimestamp: -2037:14522
    MissionEndTimestamp: -2037:14522

When it is loaded into python

import yaml

with open("example.yml", 'r') as file:
    example_ = yaml.safe_load(file)
print(yaml.dump(example_, default_flow_style=False))

results:

Base:
  EndTimes: 645:023
  MidTimes: 38702
  MissionEndTimestamp: -2037:14522
  MissionStartTimestamp: -2037:14522
  StartTime: 38700
  mac: 99:19:b9:fa:37:99

for whatever reason, any "number" value with a single colon that has 2 or fewer trailing digits gets converted to another "number"...

also tried:

import yaml

with open("example.yml", 'r') as file:
    example_ = yaml.load(file, Loader=yaml.CLoader)
print(yaml.dump(example_, default_flow_style=False))

same results (same with Loader=yaml.CSafeLoader, CFullLoader, CUnsafeLoader)

the other loader, has different results, CBaseLoader turns it into a single quote string:

Base:
  EndTimes: 645:023
  MidTimes: '645:02'
  MissionEndTimestamp: -2037:14522
  MissionStartTimestamp: -2037:14522
  StartTime: '645:0'
  mac: 99:19:b9:fa:37:99

Looks like CBaseLoader is the best, but adding the single quotes isn't great, will now have to add another step to strip those quotes... any way around this? to get it to load as the other values load.

UPDATE#1

Based on @ubaumann's answer, I've add this follow up.

install ruamel.yaml - conda install -c conda-forge ruamel.yaml or pip install ruamel.yaml

change the file header info

import sys
from ruamel.yaml import YAML
yaml=YAML(typ="rt")

and the open/dump calls

with open("example.yml", 'r') as file:
    example_ = yaml.load(file)
yaml.dump(example_, sys.stdout)

result


Base:
  StartTime: 645:0000
  EndTimes: 645:023
  MidTimes: 645:02
  mac: 99:19:b9:fa:37:99
  MissionStartTimestamp: -2037:14522
  MissionEndTimestamp: -2037:14522

if you modify the line yaml=YAML(typ="rt") to yaml=YAML(typ="safe") you'll get all of them in strings:


Base: {EndTimes: '645:023', MidTimes: '645:02', MissionEndTimestamp: '-2037:14522',
  MissionStartTimestamp: '-2037:14522', StartTime: '645:0000', mac: '99:19:b9:fa:37:99'}


Solution

  • PyYAML parses as subset of YAML 1.1 and in that specification there are sexagesimal numbers, essentially for processing values with minutes and seconds (like time, arcs). Since this let to a lot of confusion this was quickly dropped from the YAML 1.2 specification, but PyYAML was never upgraded since 2009 when that spec came out.

    You can upgrade to a YAML 1.2 parser, e.g. my ruamel.yaml and get the result you expect.