Search code examples
pythonyamlruamel.yaml

Delete empty object from yaml file


I have a yaml file with multiple objects including some empty ones. For example:

apiVersion: v1
kind: Secret
metadata:
  name: secret1
type: Opaque
data:
  password: "password1234"

---

---

apiVersion: v1
kind: Secret
metadata:
  name: secret2
type: Opaque
data:
  password: "password5678"

---

---

I want to delete the empty objects in the file using ruamel yaml so the file looks like this:

apiVersion: v1
kind: Secret
metadata:
  name: secret1
type: Opaque
data:
  password: "password1234"

---

apiVersion: v1
kind: Secret
metadata:
  name: secret2
type: Opaque
data:
  password: "password5678"

I tried the code below it doesn't work.

for y in yaml_objects:
    if y == None:
        yaml_objects.remove(y)

But it generates the file like below:

kind: Secret
metadata:
  name: secret1
type: Opaque
data:
  password: "password1234"

---

apiVersion: v1
kind: Secret
metadata:
  name: secret2
type: Opaque
data:
  password: "password5678"

--- null
...

How can I achieve this? Thanks!


Solution

  • You don't have a YAML file with multiple objects, you have a file with multiple YAML documents. The way to load such a file is using load_all() which returns a generator, returning the data structure for each document in turn.

    You can turn the output of that generator into a list, which you probably did creating your yaml_objects (but your code is incomplete, so you might be doing something else). But then iterating over the elements of that list and removing items from that same list, while iterating, leads to problems. In your case that happens as well and the second to last document (because it shifted in the list from which you are removing) is not removed and is dumped as --- null\n.... In such instances, where I have to delete potentially multiple elements from a list, I normally gather the indices of the elements and then remove the elements using these indices in reverse order (for hopefully obvious reasons).

    However, in this case, it is much more easy to discard the empty documents in the first place, while iterating over the generator:

    import sys
    import ruamel.yaml
    
    file_in = Path('input.yaml')
    file_out = Path('output.yaml')
        
    with ruamel.yaml.YAML(output=file_out) as yaml:
        yaml.preserve_quotes = True
        for data in yaml.load_all(file_in):
            if data is not None:
                yaml.dump(data)
    
    print(file_out.read_text())
    

    which gives:

    apiVersion: v1
    kind: Secret
    metadata:
      name: secret1
    type: Opaque
    data:
      password: "password1234"
    
    ---
    
    apiVersion: v1
    kind: Secret
    metadata:
      name: secret2
    type: Opaque
    data:
      password: "password5678"
    

    If you need to further work on the loaded data, you can also append data to a list but only if it is not None:

    yaml = ruamel.yaml.YAML()
    yaml.preserve_quotes = True
    
    documents = []
    for data in yaml.load_all(file_in):
        if data is not None:
            documents.append(data)
    for doc in documents:
        doc['apiVersion'] = 'v2'
    yaml.dump_all(documents, sys.stdout)
    

    which gives:

    apiVersion: v2
    kind: Secret
    metadata:
      name: secret1
    type: Opaque
    data:
      password: "password1234"
    
    ---
    
    apiVersion: v2
    kind: Secret
    metadata:
      name: secret2
    type: Opaque
    data:
      password: "password5678"