Search code examples
pythonregexlistmatchsublist

How to return the sublists in lstb if they exist in lsta, while excluding certain words in the match?


How do you return only the sublists in lstb if only part of the first item of each sublist in lstb is present within lsta? Is it possible to get a match if only 80% of the string matches an 80% of the other string?

If this isn't possible, how would I exclude certain words like 'Company' or 'Inc' or 'The'in the match so that the item would still return even if one had "The" or "Inc" and the other didn't?

For example:

lsta = ['The Fake Company','Fake Company Inc.','The Fake Company Store','Another.','Irrelevant','Not Included']
lstb = [['Fake','PersonA'], ['BCompany','PersonB'],['Another','PersonC'],['DCompany','PersonC'],['The Another Inc.','PersonC']]

I want to return only the sublists in lstb whose first item matches a string in lsta, but while excluding words like "Company" or "Inc.", since those could result in it not being matched.

Desired_ListA = [['Fake','PersonA'],['The Another Inc.','PersonC']]

I'd also like to know which words in lsta were not matched in lstb

Desired_ListB = ['Irrelevant','Not Included']

What I have so far:

Desired_ListA = []
for sublist in lstb: 
     if re.search(sublist[0],lsta):
       Desired_ListA.extend(sublist)

The issue here is that "in" or "re.search" doesn't do the trick as a sublist in lstb could have a bigger string than an item in lsta


Solution

  • re.search will find if only a part of the string matches - i.e 'Fake' will match 'The Fake Company', 'Fake Company Inc.', etc.

    import re
    
    lsta = ['The Fake Company','Fake Company Inc.','The Fake Company Store','Another.','Irrelevant','Not Included']
    lstb = [['Fake','PersonA'], ['BCompany','PersonB'],['Another','PersonC'],['DCompany','PersonC'],['The Another Inc.','PersonC'], ['thisisareallylongstringandwontmatch', 'yeaaaaaaaah']]
    
    Desired_ListA, Desired_ListB = [], []
    for sublist in lstb:
        for company in lsta:
            if re.search(sublist[0], company):
                Desired_ListA.append(sublist)
            else:
                Desired_ListB.append(sublist)
    print Desired_ListA
    print Desired_ListB