Search code examples
pythonpython-3.xserializationyamlpyyaml

Python yaml writes empty file if dump fails


I have the following snippet:

if file:
    try:
        with open(file, 'w') as outfile:
            try:
                yaml.dump(self.dataset_configuration, outfile, default_flow_style=False)
                self.dataset_configuration_file = file
            except yaml.YAMLError as ex:
                _logger.error("An error has occurred while trying to save dataset configuration:\n", ex)
    except OSError as ex:
        _logger.error(f"Unable to create file '{self.dataset_configuration_file}' "
                      f"to persist the dataset configuration\n", ex)

I've noticed that sometimes, if the object cannot be picked/serialized, then the dump method would fail and write an empty file. Is there a quick option to not overwrite the opened file if it already exists or do I have to make a copy of the file, and then if the dump operation fails, I would restore the file?

The above, as per @martineau's suggestion, can be also rewritten as:

if file:
    serialized_object_string = ''
    try:
        serialized_object_string = yaml.dump(self.dataset_configuration, default_flow_style=False)
    except yaml.YAMLError as ex:
        _logger.error("An error has occurred while trying to save dataset configuration:\n", ex)
    if serialized_object_string:
        try:
            with open(file, 'w') as output_file:
                output_file.write(serialized_object_string)
                self.dataset_configuration_file = file
        except OSError as ex:
            _logger.error(f"Unable to create file '{self.dataset_configuration_file}' "
                          f"to persist the dataset configuration\n", ex)

In this case, first a string representation of the object would be created, and then it would be written to file if it isn't empty or failed to serialize. Would there be a significant penalty for not going through a Stream object? What happens if the string representation is very big? Is there a "native" way to do it the above in one operation?


Solution

  • The above, as per @martineau's suggestion, can be also rewritten as:

    if file:
        serialized_object_string = ''
        try:
            serialized_object_string = yaml.dump(self.dataset_configuration, default_flow_style=False)
        except yaml.YAMLError as ex:
            _logger.error("An error has occurred while trying to save dataset configuration:\n", ex)
        if serialized_object_string:
            try:
                with open(file, 'w') as output_file:
                    output_file.write(serialized_object_string)
                    self.dataset_configuration_file = file
            except OSError as ex:
                _logger.error(f"Unable to create file '{self.dataset_configuration_file}' "
                              f"to persist the dataset configuration\n", ex)
    

    It works, although I was hoping for more of an "integrated" solution.