Search code examples
pythoncomparisonlist-comprehension

Compare two folders, return full path of different file


I have a script which should compare files in folders and subfolders. The new files should be copied later. This is the function I use to create the lists.

def fullNames(source):
    matches = []

    for root, dirnames, filenames in os.walk(source):
        for filename in filenames:
            if filename.endswith('.xlsx'):
                matches.append(os.path.join(root, filename))
    return matches

This function returns lists like this:

list1 =  ['C:/Users/langma/Desktop/EDI/Downloadfolder/EDI_2020-05-18\\file1.xlsx',
          'C:/Users/langma/Desktop/EDI/Downloadfolder/EDI_2020-05-18\\file2.xlsx',
          'C:/Users/langma/Desktop/EDI/Downloadfolder/EDI_2020-05-18\\file3.xlsx',
          'C:/Users/langma/Desktop/EDI/Downloadfolder/EDI_2020-05-18\\file4.xlsx',
          'C:/Users/langma/Desktop/EDI/Downloadfolder/EDI_2020-05-18\\file5.xlsx']

list2 =  ['C:/Users/langma/Desktop/EDI/Downloadfolder/EDI_2020-05-17\\file1.xlsx',
          'C:/Users/langma/Desktop/EDI/Downloadfolder/EDI_2020-05-17\\file2.xlsx',
          'C:/Users/langma/Desktop/EDI/Downloadfolder/EDI_2020-05-17\\file3.xlsx',
          'C:/Users/langma/Desktop/EDI/Downloadfolder/EDI_2020-05-17\\file4.xlsx']

To compare the files, I have to compare the basenames of each file.

list1_short = [os.path.basename(file) for file in list1]
list2_short = [os.path.basename(file) for file in list2]

result = [item for item in list1_short if item not in list2_short]
result

Out[134]: ['file5.xlsx']

This works, but I need to return the full path of that file, not the basename. Does anyone have an idea how to solve this?

This would be the desired result:

['C:/Users/langma/Desktop/EDI/Downloadfolder/EDI_2020-05-18\\file5.xlsx']

Solution

  • You could just get rid of list2_short actually:

    list1 =  ['C:/Users/langma/Desktop/EDI/Downloadfolder/EDI_2020-05-18/file1.xlsx',
              'C:/Users/langma/Desktop/EDI/Downloadfolder/EDI_2020-05-18/file2.xlsx',
              'C:/Users/langma/Desktop/EDI/Downloadfolder/EDI_2020-05-18/file3.xlsx',
              'C:/Users/langma/Desktop/EDI/Downloadfolder/EDI_2020-05-18/file4.xlsx',
              'C:/Users/langma/Desktop/EDI/Downloadfolder/EDI_2020-05-18/file5.xlsx']
    
    list2 =  ['C:/Users/langma/Desktop/EDI/Downloadfolder/EDI_2020-05-17/file1.xlsx',
              'C:/Users/langma/Desktop/EDI/Downloadfolder/EDI_2020-05-17/file2.xlsx',
              'C:/Users/langma/Desktop/EDI/Downloadfolder/EDI_2020-05-17/file3.xlsx',
              'C:/Users/langma/Desktop/EDI/Downloadfolder/EDI_2020-05-17/file4.xlsx']
    
    
    existing_names = [os.path.basename(item) for item in list2]
    missing_files = [item for item in list1 if os.path.basename(item) not in existing_names]