I used listdir
to read the files in two folders:
from os import listdir
list_1 = [file for file in listdir("./folder1/") if file.endswith(".csv")]
list_2 = [file for file in listdir("./folder2/") if file.endswith(".json")]
and now I have two lists:
list_1 = ['12_a1_pp.csv', '32_a3_pp.csv', '45_a17_pp.csv', '81_a123_pp.csv']
list_2 = ['12_a1.json', '32_a3.json', '61_a54.json']
I want to find the corresponding two sublists containing those files whose initial part of the name is the same. In other words:
list_1b = ['12_a1_pp.csv', '32_a3_pp.csv']
list_2b = ['12_a1.json', '32_a3.json']
How can I do that?
PS note that the listdir
part may not matter to answer the question. I only included it, because if the result of listdir
is guaranteed to be in alphabetical order, then that might help in traversing the two lists. Of course in this simple example the lists are short, but in the real use case they cointain hundreds of files.
A more pythonic approach would use the &
(intersection) operator for sets:
common = set(x[:-7] for x in list_1) & set(x[:5] for x in list_2)
list_1b = [x + '_pp.csv' for x in common]
list_2b = [x + '.json' for x in common]
EDIT : If you need to split on a specific character (see comment) for each list, here is an updated version (search for the last '_' in list_1 and search for the last '.' in list_2):
common = set(x[:x.rindex('_')] for x in list_1) & set(x[:x.rindex('.')] for x in list_2)