Search code examples
pythonregexlistloopsstring-matching

Partial match between two python lists, one list exactly present in the other with few additional characters in Python


I have two lists.

files = ['26ZJ35_v1.4.doc', '2EPWW9_v1.1.pdf', '344D4Q_v1.8.ppt'. '33ADNL_v3.0.pdf']

baseline_documents  = ['26ZJ35', '2EPWW9']

I want to find all the matches in list1 which has an exact string match from list 2 and append to a new list.

Output desired:

list3 = ['26ZJ35_v1.4.doc', '2EPWW9_v1.1.pdf']

Code till now:

import csv
import os
import re
metadata = []
with open('D:/meta_demo.csv', 'r') as f:
    rows = csv.reader(f)
    for i in rows:
        metadata.append(i)
        #print(i)    
baseline_documents = metadata[1:20]
DIR = 'D:/demo_files/'
files = [i for i in os.listdir(r"D:\demo_files")]

list3 = []
for i in files:
    if re.search(r"[^_]*", i) in baseline_documents:
        list3.append(files)

list3 = [i for i in baseline_documents if re.search(r"[^_]*", i) in files]

Solution

  • You can use str.startswith

    Ex:

    files = ['26ZJ35_v1.4.doc', '2EPWW9_v1.1.pdf', '344D4Q_v1.8.ppt', '33ADNL_v3.0.pdf']
    baseline_documents  = ['26ZJ35', '2EPWW9']
    result = [i for i in files if i.startswith(tuple(baseline_documents))]
    print(result)
    

    If you need regex use re.match.

    Ex:

    import re
    
    files = ['26ZJ35_v1.4.doc', '2EPWW9_v1.1.pdf', '344D4Q_v1.8.ppt', '33ADNL_v3.0.pdf']
    baseline_documents  = ['26ZJ35', '2EPWW9']
    pattern = re.compile("|".join(baseline_documents))
    
    result = [i for i in files if pattern.match(i)]
    print(result)
    

    Output:

    ['26ZJ35_v1.4.doc', '2EPWW9_v1.1.pdf']