Search code examples
c#.netlinq

Within 2 different sorted string lists, keep only the elements that match the substring number in both


I successfully extracted the full path name of each file in two different directories into separate string lists. Each of the two lists now have strings corresponding to the full path + file name within the corresponding folders.

Each element in the firstList looks something like:

"C:\users\me\path\anotherDirectory\firstFolder\actual_fileName_######.extension"

Similarly, in the secondList it looks something like:

"C:\users\me\path\anotherDirectory\secondFolder\actual_fileName_######.extension"

the "######" part corresponds to 6 digits that I call the serial number.

Ideally, each file in the firstFolder will match the serial number with one in the secondFolder, but that's not always the case.

I want to remove those elements that do not have a matching serial number in both lists so that afterwards I should end up with the same two lists, but now every single string element in firstList has a matching serial number with one on secondList (and vice versa). No serial number should be repeated within the same list and they should remain sorted.

I had thought of some partial solutions, but they felt kind of convoluted and inefficient. I feel like this is something that can be done very concisely using LINQ, but I do not have the knowledge yet to do so.


Solution

  • Managed to solve it! I basically just took NetMage's idea and adapted it a bit, so huge thanks to them, else I wouldn't have been able to solve it.

    What worked for me was:

    var serialNumRE = new Regex(@"_(\d+)\.", RegexOptions.Compiled);
    
    var firstFolderSerialNums = firstFolderFiles.Select(path => serialNumRE.Match(Path.GetFileName(path)).Groups[1].Value);
    var secondFolderSerialNums = secondFolderFiles.Select(path => serialNumRE.Match(Path.GetFileName(path)).Groups[1].Value);
    
    var finalFirstFolderFiles = firstFolderFiles
                                .IntersectBy(secondFolderSerialNums, path => serialNumRE.Match(Path.GetFileName(path)).Groups[1].Value)
                                .ToList();
    
    var finalSecondFolderFiles = secondFolderFiles
                                .IntersectBy(firstFolderSerialNums, path => serialNumRE.Match(Path.GetFileName(path)).Groups[1].Value)
                                .ToList();
    

    If anybody has a more efficient solution or a way of doing it using better practices, let me know.