Search code examples
pythonpython-2.7string-matchinginverse-match

Inverse Match with Python


I've been trying to work with two lists in Python 2.7. I've come part way, but spending some time searching hasn't brought up much in the way of results.

List1: Is a list of specific numbers sequences that I was searching within List2. (e.g.) ['209583', '185372', '684392', '995423']

List2: Has a variation of these numbers from list1. (e.g.) ['209583_345829', '57185372', '853921864']

Now I can match and pull with what I found below... But I was also looking for the inverse; set a variable to all the numbers in List1 that are not in List2.

matching = [s for s in list2 if any(xs in s for xs in list1)]

So what should be left in a non matching variable would be '995423'. I've tried reworking the code above but I feel like it's right under my nose.

Also, would it not be beneficial to just use an If/Else statement for performance reasons? E.g. If matching do this, else not matching do this... That way it is only running once vs twice.

This is a simple example, but the lists for both could push over 10,000 lines per.

Thanks!


Solution

  • First things first: The list comprehension you have at hand is faulty. To accomplish a list full of items in List1 that have matches in List2, you want to use this:

    All items FROM List1 WITH matches in List2

    matches = [item for item in List1 if any(item in compared for compared in List2)]
    

    To explain:
    [s for s in List1 if any(xs in s for xs in List2)] - Your original algorithm was pulling elements s from List1 and elements xs from List2, and trying to see if xs was contained in s, which is inherently the opposite of what we want to do.

    [s for s in list2 if any(xs in s for xs in list1)] - Your new algorithm has inverted the wrong variables. Now it is pulling s from List2 and xs from List1 and checking if xs is in s - which is closer to the original idea. The only problem is, the way your algorithm is set up, it will place the items from List2 into the list if they have a match in List1 (which might be what you want after all?)

    [item for item in List1 if any(item in compared for compared in List2)] - Made a bit more verbose for easy reading, this algorithm will pull out items from List1, check if they have a 'container' in List2, and add them to the list if they do. (Side note: an alternative list comprehension that will return the same results is [item for item in List1 for compared in List2 if item in compared], which is a bit more intuitive to read.)

    With that out of the way: If you want to get every item from List1 that does not have a match in List2, you can use the algorithm I specified above to gain the matches list, and then, as Ali SAID OMAR stated in a comment, use set operations:

    All items IN List1 WITHOUT matches in List2 - Set operation

    nomatches = set(List1) - set(matches)
    

    This will take all unique elements of List1, remove the matched elements, and return a set object with all the unmatched elements remaining. Alternatively, if you want a solution in one statement:

    All items IN List1 WITHOUT matches in List2 - List comprehension

    nomatches = [item for item in List1 if not any(item in compared for compared in List2)]
    

    To give credit where credit is due, this is identical to yedpodtrzitko's solution in the post comments.

    Since it's hard to tell what you're asking, though, and in comments you have flip-flopped what you're asking at least once, I will add two more algorithms:

    All items IN List2 WITH matches in List1

    matches2 = [item for item in List2 for key in List1 if key in item]
    

    All items IN List2 WITHOUT matches in List1 - List Comprehension

    nomatches2 = [item for item in List2 if not any(key in item for key in List1)]
    

    All items IN List2 WITHOUT matches in List1 - Set Operation

    nomatches2 = set(List2) - set(matches2)
    

    Each of these has been tested through your test case described in your post, and returned the expected results. If these algorithms don't do what you need them to, please double-check that it isn't a problem on your end, and if this doesn't answer your question, please make sure you are clear with what you're asking. Thanks.