Search code examples
pythonstringlistany

Return indices of string and substring matches in lists


I have two list, one list containing peoples last names and a another list containing similar data. I have used any() to match the two lists and output the matches.

Example data provided, real lists consist of thousands of entries.

matchers = ['Balle', 'Jobson', 'Watts', 'Dallow', 'Watkins']
full_name = ['Balle S & R', 'Donald D & S', 'Watkins LTD', 'Balle R & R', 'Dallow K & C']

matching = [s for s in full_name if any(xs in s for xs in matchers)]

print(matching)

I want to return the indices of each match.For the above example, the ideal output would be:

[0, 0], [4, 2], [0, 3], [3, 4] 

I have tried:

print([[i for i in range(len(full_name)) if item1 == full_name[i]] for item1 in matchers])

But this returns a list of empty arrays. In reality my lists consist of thousand of entries. Is it possible to find the matched indices when the match is on data that is not exactly the same?


Solution

  • You can use "matcher IN name" instead of "==".

    Explanation: enumerate() helps me go through the list and returns (index,value) for each value in the list. So, "index1" stores the index of "matcher" in the list "matchers". Similarly, "index2" is the index of "name" in full_name.

    Then, I check whether "matcher" is a substring of "name". If this is true then I will add the matcher index and the name index to the final list.

    Dry run: Let's say when index1=0, matcher="Balle", then I will loop through all the values in full_name. Let's say index2=0, name="Balle S & R". Then, my if check is true because "Balle" is a substring of "Balle S & R". So, I will append [index1, index2] which is [0,0] to my final list. If matcher is not a substring, then I ignore the pair and move on.

    Here is a working code using loops.

    matches = []
    #Loop through each value in matchers and store (index, value)
    for index1, matcher in enumerate(matchers):
    
    #Loop through each value in full_name and store (index, value)
        for index2, name in enumerate(full_name):
    
            #Check if matcher is a substring of name
            if(matcher in name):
               
                #If true then add indices to the list 
                matches.append([index1, index2])
    

    Here is a shorter, more pythonic version:

    matches = [[i1, i2] for i1 in range(len(matchers)) for i2 in range(len(full_name)) if matchers[i2] in full_name[i1]]
    

    Output for both: [[0, 0], [0, 3], [3, 4], [4, 2]]