Search code examples
pythonpython-3.xlisttuplesinner-join

How to inner-join a list of tuples with a list?


I have a lists of tuples (>150k elements) containing an ID with the length of that ID. However, this list should only show IDs that appear in the second list (>80k).

list_of_tuples = [('1', 31.46), ('10', 97.99), ('50', 71.19), ('100', 17.03), ...]
normal_list = ['1', '50', '100', ...]

The desired output is:

list_of_tuples = [('1', 31.46), ('50', 71.19), ('100', 17.03), ...]

Here's the code that I threw together for testing the concept, but as I am new to Python, it doesn't work. I also haven't found a solution online for this kind of issue.

    for whole_elem in list_of_tuples:
        for first_elem in whole_elem:
            for link in normal_list:
                if first_elem <> link
                list_of_tuples.pop(whole_elem)

I would appreciate your support a lot. Thank you very much!


Solution

  • You can probably solve this conceptually at the same level as asked ("inner join") with pandas join function (look at this question).

    However, here I would just do the following:

    result = []
    normal_set = set(normal_list)  # improves performance of 'contains' check from len(normal_list)/2 (on avarage) to log(len(normal_list)
    for tpl in list_of_tuples:
       if tpl[0] in normal_set:
          result.append(tpl)
    

    On top of that, you can yield elements of the result instead of append if you do not need to consume the result as a whole:

    def inner_join(normal_list, list_of_tuples):
        result = []
        normal_set = set(normal_list)
        for tpl in list_of_tuples:
           if tpl[0] in normal_set:
              yield tpl
    

    Which you use as:

    for el in innner_join(normal_list, list_of_tuples):
        ....