Search code examples
pythonregexyaml

Extract and replace text matching multi-line pattern after keyword


I am trying to transform a yaml document in order to make it easier to parse later in my project. I have output that looks like this:

I want to use regex to find all occurrences of:

  Export:
     Name: xxxxx

But only those occurrences which appear after "Output:"

My document looks something like this:

Resources: 
  MyResource: Resource 1
  Export: 
    Name: Some undesired export name
Outputs:
  ExportOne:
    Description: Notes about export #1 here
    Export:
      Name: Desired export #1
  ExportTwo: 
    Description: Notes about export #2 here
    Export:
      Name: Desired export #2

So far no matter what I have tried I only capture one of the "Export: Name:" occurrences.

After finding the aforementioned text my goal is to change it to

ExportName: <existing value>

I am using python for this project, so far the regular expressions I have tried have all been variations of:

(?s)(?<=Outputs:)(.+(Export:.*?Name:)+)

Any advice on how to do this... or a better way to perform this transformation?

Thanks!


Solution

  • I suppose from your tags that the input file is YAML. It would be better to use a YAML library for parsing and modifying the file. Using regular expressions is also possible, but not the best solution for me.

    You have a missing space in your input file "MyResource: Res1".

    Here is an example:

    pip install pyyaml
    
    import yaml
    
    # Read from a file
    def read_yaml(filename):
        with open(filename, 'r') as file:
            yaml_data = yaml.safe_load(file)
            return yaml_data
    
    
    # Find and replace values
    def modify_data(yaml_data):
        for output in yaml_data['Outputs'].values():
            if 'Export' in output and 'Name' in output['Export']:
                name = output['Export']['Name']
                output['ExportName'] = name
                del output['Export']
        return yaml_data
    
    # Write to a file
    def save_yaml(filename, yaml_data):
        with open(filename, 'w') as file:
            yaml.dump(yaml_data, file, sort_keys=False)
    
    if __name__ == '__main__':
        data = read_yaml('input.yaml')
        modified_data = modify_data(data)
        save_yaml('output.yaml', modified_data)
    
    python main.py && cat output.yaml
    Resources:
      MyResource: Res1
      Export:
        Name: SomeName
    Outputs:
      ExportOne:
        ExportName: abc
      ExportTwo:
        ExportName: def
    

    If you need to parse YAML v1.2 or later check the @Anthon answer.

    Other yaml parsers that can be used: ruamel.yaml, strictyaml, poyo