Search code examples
pythonpandasdictionarysubstring

Check if a value in a dictionary is a substring of another key-value pair in Python


I have a dictionary disease_dict with values in a list element. I would like to fetch key and value for specific keys and then check if the value (as a substring) exists in other keys and fetch all the key --> value pair.

For example this is the dictionary. I would like to see if the 'Stroke' or 'stroke' exist in the dictionary and then match if the value of this key is a substring of other value (like 'C10.228.140.300.775' exists in 'C10.228.140.300.275.800', 'C10.228.140.300.775.600')

'Stroke': ['C10.228.140.300.775', 'C14.907.253.855'], 'Stroke, Lacunar': ['C10.228.140.300.275.800', 'C10.228.140.300.775.600', 'C14.907.253.329.800', 'C14.907.253.855.600']

I have the following lines of code for fetching the key and value for a specific term.

#extract all child terms
for k, v in dis_dict.items():
    if (k in ['Glaucoma', 'Stroke']) or (k in ['glaucoma', 'stroke']):
        disease = k
        tree_id = v
        print (disease, tree_id)
    else:
        disease = ''
        tree_id = ''
        continue

Any help is highly appreciated!


Solution

  • The code below should do what you want to achieve:

    dis_dict = {
        'Stroke':          ['C10.228.140.300.775', 'C14.907.253.855'], 
        'Stroke, Lacunar': ['C10.228.140.300.275.800', 'C10.228.140.300.775.600', 'C14.907.253.329.800', 'C14.907.253.855']
    }
    
    dict_already_printed = {}
    for k, v in dis_dict.items():
        if ( k.lower() in ['glaucoma', 'stroke'] ):
            disease = k
            tree_id = v
            output = None
            for c_code_1 in tree_id:
                for key, value in dis_dict.items():  
                    for c_code_2 in value: 
                        if c_code_1 in c_code_2: 
                            if f'{disease} {tree_id}' != f'{key} {value}':
                                tmp_output = f'{disease} {tree_id}, other: {key} {value}'
                                if tmp_output not in dict_already_printed:
                                    output = tmp_output
                                    print(output)
                                    dict_already_printed[output] = None
            if output is None: 
                output = f'{disease} {tree_id}'
                print(output)
    
        else:
            disease = ''
            tree_id = ''
            continue
    

    so test it with another data for the dictionary to see if it works as expected. It prints only in case of complete match:

    Stroke ['C10.228.140.300.775', 'C14.907.253.855'], other: Stroke, Lacunar ['C10.228.140.300.275.800', 'C10.228.140.300.775.600', 'C14.907.253.329.800', 'C14.907.253.855']
    

    or if no other disease was found (with dictionary values changed to avoid a match) only the found one:

    Stroke ['C10.228.140.300.775', 'C14.907.253.855']