Search code examples
pythonyaml

python how to find duplicated values of yaml file for specific key


I have a yaml file like this:

-
    ip: 1.1.1.1
    status: Active
    type: 'typeA'
-
    ip: 1.1.1.1
    status: Disabled
    type: 'typeA'
-
    ip: 2.2.2.2
    status: Active
    type: 'typeC'
-
    ip: 3.3.3.3
    status: Active
    type: 'typeB'
-
    ip: 3.3.3.3
    status: Active
    type: 'typeC'
-
    ip: 2.2.2.2
    status: Active
    type: 'typeC'
-

I'm going to find any duplicate IPs which type is the same.

For example, IP 1.1.1.1 has two entries and both types are typeA, so it should be considered. But IP 3.3.3.3's type is not the same so it should not be.

Expected output:

IP 1.1.1.1, typeA duplicate
IP 2.2.2.2, typeC duplicate

Solution

  • install pyyaml using pip install pyyaml then run the python script by replacing myyaml.yaml with your YAML file

    import yaml
    
    with open('myyaml.yaml', 'r') as file:
        data = yaml.safe_load(file)
    
    ip_type_map = {}
    
    for entry in data:
        if entry and 'ip' in entry and 'type' in entry:
            ip, entry_type = entry['ip'], entry['type']
            print(f"IP {ip}, {entry_type} duplicate") if (ip in ip_type_map and entry_type == ip_type_map[ip]) else ip_type_map.update({ip: entry_type})
        else:
            print("Invalid entry in YAML data.")