I have a lists of tuples (>150k elements) containing an ID with the length of that ID. However, this list should only show IDs that appear in the second list (>80k).
list_of_tuples = [('1', 31.46), ('10', 97.99), ('50', 71.19), ('100', 17.03), ...]
normal_list = ['1', '50', '100', ...]
The desired output is:
list_of_tuples = [('1', 31.46), ('50', 71.19), ('100', 17.03), ...]
Here's the code that I threw together for testing the concept, but as I am new to Python, it doesn't work. I also haven't found a solution online for this kind of issue.
for whole_elem in list_of_tuples:
for first_elem in whole_elem:
for link in normal_list:
if first_elem <> link
list_of_tuples.pop(whole_elem)
I would appreciate your support a lot. Thank you very much!
You can probably solve this conceptually at the same level as asked ("inner join") with pandas
join
function (look at this question).
However, here I would just do the following:
result = []
normal_set = set(normal_list) # improves performance of 'contains' check from len(normal_list)/2 (on avarage) to log(len(normal_list)
for tpl in list_of_tuples:
if tpl[0] in normal_set:
result.append(tpl)
On top of that, you can yield
elements of the result instead of append
if you do not need to consume the result as a whole:
def inner_join(normal_list, list_of_tuples):
result = []
normal_set = set(normal_list)
for tpl in list_of_tuples:
if tpl[0] in normal_set:
yield tpl
Which you use as:
for el in innner_join(normal_list, list_of_tuples):
....