Search code examples
pythonlistmatching

How to find list which items are not in output list and add list to output list?


I have a list with id's called total_list. I want to add the list in total_list to an accepted_list if none of the ids are found in the accepted_list. Here is a data set and the code I used. Any help is appreciated. Thanks.

Sample of total_list = [[u'19'], [u'34', u'36'], [u'34', u'36'], [u'50', u'51'], [u'46', u'47'], [u'48', u'49'], [u'38', u'39', u'40'], [u'41', u'42', u'44'], [u'46', u'47', u'48'], [u'47', u'48', u'49'], [u'37', u'50', u'51'], [u'294', u'295', u'296'], [u'296', u'297', u'298'], [u'37', u'38', u'51'], [u'41', u'42', u'43', u'44'], [u'37', u'38', u'39', u'40'], [u'294', u'295', u'296', u'297'], [u'40', u'41', u'43', u'44'], [u'295', u'296', u'297', u'298'], [u'784', u'785', u'786', u'793'], [u'26', u'787', u'788', u'789', u'808'], [u'38', u'39', u'40', u'41', u'43'], [u'40', u'41', u'42', u'43', u'44'], [u'778', u'779', u'780', u'781', u'782'], [u'294', u'295', u'296', u'297', u'298'], [u'783', u'784', u'785', u'786', u'791', u'792'], [u'794', u'795', u'796', u'798', u'799', u'800'], [u'778', u'779', u'780', u'781', u'782', u'783'], [u'778', u'779', u'780', u'781', u'782', u'787', u'788'], [u'783', u'784', u'785', u'786', u'791', u'792', u'793'], [u'26', u'780', u'781', u'782', u'787', u'788', u'789'], [u'792', u'793', u'794', u'795', u'796', u'798', u'799'], [u'793', u'794', u'795', u'796', u'798', u'799', u'800'], [u'21', u'22', u'23', u'24', u'25', u'816', u'817'], [u'21', u'22', u'23', u'24', u'25', u'815', u'816', u'817'], [u'26', u'780', u'781', u'782', u'787', u'788', u'789', u'790'], [u'21', u'22', u'23', u'24', u'25', u'815', u'816', u'817'], [u'778', u'779', u'780', u'781', u'782', u'783', u'787', u'788', u'789'], [u'21', u'22', u'23', u'24', u'25', u'814', u'815', u'816', u'817'], [u'778', u'779', u'780', u'781', u'782', u'783', u'787', u'788', u'789'], [u'779', u'780', u'781', u'783', u'784', u'785', u'789', u'790', u'791'], [u'783', u'788', u'789', u'790', u'791', u'792', u'797', u'804', u'805'], [u'26', u'780', u'781', u'783', u'787', u'788', u'789', u'790', u'804'], [u'21', u'22', u'23', u'24', u'25', u'814', u'815', u'816', u'817'], [u'26', u'808', u'809', u'810', u'814', u'815', u'816', u'817', u'818'], [u'808', u'809', u'810', u'813', u'814', u'815', u'816', u'817', u'818'], [u'795', u'796', u'797', u'798', u'799', u'800', u'801', u'802', u'803', u'811'], [u'785', u'786', u'791', u'792', u'793', u'794', u'795', u'797', u'798', u'799'], [u'801', u'806', u'809', u'810', u'811', u'812', u'813', u'814', u'815', u'817', u'818'], [u'801', u'802', u'803', u'806', u'809', u'810', u'811', u'812', u'813', u'814', u'815'], [u'784', u'785', u'790', u'791', u'792', u'793', u'794', u'797', u'798', u'799', u'803', u'804'], [u'21', u'22', u'23', u'24', u'25', u'808', u'809', u'814', u'815', u'816', u'817', u'818'], [u'792', u'793', u'794', u'795', u'796', u'797', u'798', u'799', u'800', u'801', u'802', u'803'], [u'797', u'798', u'799', u'800', u'801', u'802', u'803', u'806', u'810', u'811', u'812', u'813'], [u'789', u'790', u'791', u'792', u'797', u'802', u'803', u'804', u'805', u'806', u'810', u'811'], [u'783', u'784', u'785', u'790', u'791', u'792', u'793', u'797', u'798', u'803', u'804', u'805'], [u'790', u'791', u'797', u'798', u'802', u'803', u'804', u'805', u'806', u'809', u'810', u'811'], [u'797', u'798', u'801', u'802', u'803', u'804', u'805', u'806', u'809', u'810', u'811', u'812', u'813'], [u'24', u'25', u'808', u'809', u'810', u'811', u'812', u'813', u'814', u'815', u'816', u'817', u'818'], [u'797', u'798', u'799', u'800', u'801', u'802', u'803', u'804', u'805', u'806', u'810', u'811', u'812'], [u'805', u'806', u'808', u'809', u'810', u'811', u'812', u'813', u'814', u'815', u'816', u'817', u'818'], [u'797', u'800', u'801', u'802', u'803', u'804', u'805', u'806', u'809', u'810', u'811', u'812', u'813', u'814'], [u'22', u'23', u'24', u'25', u'808', u'809', u'810', u'812', u'813', u'814', u'815', u'816', u'817', u'818'], [u'21', u'22', u'23', u'24', u'25', u'808', u'809', u'810', u'813', u'814', u'815', u'816', u'817', u'818'], [u'791', u'792', u'797', u'798', u'799', u'800', u'801', u'802', u'803', u'804', u'805', u'806', u'811', u'812'], [u'791', u'792', u'793', u'794', u'795', u'796', u'797', u'798', u'799', u'800', u'801', u'802', u'803', u'805', u'806'], [u'790', u'791', u'792', u'793', u'797', u'798', u'799', u'800', u'801', u'802', u'803', u'804', u'805', u'806', u'811'], [u'801', u'802', u'804', u'805', u'806', u'808', u'809', u'810', u'811', u'812', u'813', u'814', u'815', u'817', u'818']]

So when I go through my total_list the first item is 19, which is not found in a list in my accepted_list(empty). As soon as it find an item in a list in total_list, which is in a list of my accepted_list, it won't add it to accepted_list.

accept_intersect = []
inaccept = False

for i in totallist:
    if not accept_intersect:
        accept_intersect.append(i)
        continue
    for p in i:
        for a in accept_intersect:
            if p in a:
                inaccept = True
                break
    if inaccept is False:
        accept_intersect.append(i)
    elif inaccept is True:
        pass

for a in accept_intersect:'
    print a

Solution

  • I would not go with dict as a set is the native implementationfor any of the two use cases:

    1) flatten on output

    2) accept two-level output

    The following answer is providing both possibilities (as it is not clear to me, which one is requested in the question):

    #! /usr/bin/env python
    from __future__ import print_function
    
    total_list = [['19'], ['34', '36'], ['34', '36'], ['50', '51']]
    
    accept_intersect = []
    seen = set()
    
    for item_seq in total_list:
        for item in item_seq:
            hashable_item = tuple(item)
            if hashable_item not in seen:
                accept_intersect.append(item)
                seen.add(hashable_item)
    
    print(accept_intersect)
    
    accept_intersect = []
    seen = set()
    
    for item_seq in total_list:
        hashable_item = tuple(item_seq)
        if hashable_item not in seen:
            accept_intersect.append(item_seq)
            seen.add(hashable_item)
    
    print(accept_intersect)
    

    That yields on my machine when run:

    ['19', '34', '36', '50', '51']
    [['19'], ['34', '36'], ['50', '51']]
    

    So the main point that needs specification is if you want variant 1) the flattening, than you input is automatically hashable, thus fits in dicts and sets as key, so also both other proposed answers (to date) work. But if you want to accept lists as entries and want to filter out duplicates of lists and leve those unique intact (i.e. variant 2) without flattening, than the above trick of mapping the lists into a tuple before checking unqiueness is the way to go.

    I left the tuple() call inside the flattening variant (the first loop) as this may be more robust and keeps the focus on the algorithm more clear (IMO).