Consider the following three example lists:
List<string> localPatientsIDs = new List<string> { "1550615", "1688", "1760654", "1940629", "34277", "48083" };
List<string> remotePatientsIDs = new List<string> { "000-007", "002443", "002446", "214", "34277", "48083" };
List<string> archivedFiles = new List<string>{
@"G:\Archive\000-007_20230526175817297.zip",
@"G:\Archive\002443_20230526183639562.zip",
@"G:\Archive\002446_20230526183334407.zip",
@"G:\Archive\14967_20240703150011899.zip",
@"G:\Archive\214_20231213150003676.zip",
@"G:\Archive\34277_20230526200048891.zip",
@"G:\Archive\48083_20240214150011919.zip" };
Please note that each element in archivedFiles
is the full path of a ZIP file, whose name begins with the patientID
that is either in localPatientsIDs
or remotePatientsIDs
.
For example: @"G:\Archive\000-007_20230526175817297.zip"
: the filename 000-007_20230526175817297.zip
initiate with 000-007
, which is an element in the list remotePatientsIDs
.
A patientID connot be at localPatientsIDs
and archivedFiles
simultaneously, therefore, no duplicates are allowed between these two lists. However, the archivedFiles
can contain patientIDs that are also located in remotePatientsIDs
.
I need to get the elements in archivedFiles
whose file names begin with the elements present in remotePatientsIDs
but not present in localPatientsIDs
. The endpoint is to Unzip those files to the directory that contains localPatientsIDs
database.
For the given example, I would expect to have the following result:
archivedFilesToUnzip == {
@"G:\Archive\000-007_20230526175817297.zip",
@"G:\Archive\002443_20230526183639562.zip",
@"G:\Archive\002446_20230526183334407.zip",
@"G:\Archive\214_20231213150003676.zip" }
So, how can I use LINQ to do this ?
In my lack of knowledge, I would expect it to be as simple as:
List<string> archivedFilesToUnzip = archivedFiles.Where(name => name.Contains(remotePatients.Except(localPatients)))
I cannot even compile this, since Contains
probably is unable to iterate over the List members and I get the message:
CS1503: Argument 1: cannot convert from 'System.Collections.Generic.IEnumerable<string>' to 'string'
Then my best trial so far is the following sentence (I confess it seems a little messy to me). It always returns an empty list.
List<string> archivedFilesToUnzip = archivedFiles.Where(name => archivedFiles.Any(x => x.ToString().Contains(remotePatients.Except(localPatients).ToString()))).ToList();
I've found these helpful posts that helped me to better understand the differences between Where
and Select
:
Also, I've been looking for any directions using LINQ on :
and other links as well, but I still cannot find a working solution.
C# is statically (and mostly strongly) typed language (see the What is the difference between a strongly typed language and a statically typed language? question and The C# type system article if you want to dive deeper). It means that compiler will check variable types and will not allow a lot of mistakes like comparing string and boolean.
remotePatients.Except(localPatients)
is a collection of string
's while name
in archivedFiles.Where(name => name
is "just" a string
. Contains
on string can accept either char
(a symbol in a string
) or another string
, not a collection of strings, hence the compilation error.
Your second attempt compiles, but will not achieve anything meaningful - if you assign remotePatients.Except(localPatients).ToString()
to a variable and examine it or print the result to console you will see just the type name (System.Linq.Enumerable+<ExceptIterator>d__99
1[System.String]` to be exact) which obviously is not part of the file name.
As for your question, I would suggest to do the following:
// build the diff hashset for quick lookup for ids to add
// will improve performance if there are "many" ids
var missing = remotePatients.Except(localPatients)
.ToHashSet();
// regular expression to extract id from the file name
// you can implement this logic without regex if needed
var regex = new Regex(@"\\(?<id>[\d-]+)_\d+\.zip");
// the result
List<string> archivedFilesToUnzip = archivedFiles
.Where(name =>
{
var match = regex.Match(name); // check the file name for id
if (match.Success) // id found
{
// extract the id from the file name
var id = match.Groups["id"].Value;
return missing.Contains(id); // check if it should be added
}
// failed to match pattern for id
// probably can throw error here to fix the pattern or check the file name
return false;
})
.ToList();
This uses regular expression to extract id from the file name and then search it in the "missing" ids.
Explanation for this particular regular expression can be found @regex101.