Search code examples
c#regexdirectoryfilenames

folder name contain names c# Directory


I am working on Excel add-ins with intranet server.

I have names of employees and each one has a folder in the intranet and this folder may has a power point file may not. so I need to read the files for each name.

the Problem is with names: each folder name has this Pattern :

surname, firstname

but the problem is with the names who contain multiple names as a firstname or surname:

ex: samy jack sammour. the first name is: "samy jack" and the last name is "sammour"

so the folder would be : sammour, samy jack

but I have only the field name, I don't know what is the last name or the firstname(it could be "jack sammour, samy" or "sammour, samy jack"). so I tried this code to fix it:

string[] dirs = System.IO.Directory.GetFiles(@"/samy*jack*sammour/","*file*.pptx");
if (dirs.Length > 0)
{
    MessageBox.Show("true");
}

but it gave me an error:

file is not illegal

how can I fix this problem and search all the possibilties


Solution

  • That should do the trick:

    var path = @"C:\Users\";
    var name = "samy jack sammour";
    
    Func<IEnumerable<string>, IEnumerable<string>> permutate = null;
    permutate = items =>
        items.Count() > 1 ?
            items.SelectMany(
                (_, ndx1) => permutate(items.Where((__, ndx2) => ndx1 != ndx2)),
                (item1, item2) => item1 + (item2.StartsWith(",") ? "" : " ") + item2) :
            items;
    
    var names = name.Split(new[] { ' ' }, StringSplitOptions.RemoveEmptyEntries).Concat(new[] { "," }).ToArray();
    var dirs = new HashSet<string>(permutate(names).Where(n => !n.StartsWith(",") && !n.EndsWith(",")), StringComparer.OrdinalIgnoreCase);
    if (new DirectoryInfo(path).EnumerateDirectories().Any(dir => dirs.Contains(dir.Name) && dir.EnumerateFiles("*.pptx").Any()))
        MessageBox.Show("true");
    

    In my opinion, you should't do this with a Regex because regexes can't match permutations very well. Instead you can create a HashSet which contains all case-insensitive permutations that correlate to your pattern:

    surname, firstname

    (Case-sensitivity isn't required because the windows file system doesn't care if a directory or file name is upper or lower case.)

    For the sake of simplicity I just add the comma to the permutation parts and filter the items that start or end with a comma in a next step. If performance matters or if the names can consist of many parts I'm sure that there's a way to optimize these possibilities away sooner to prevent large parts of the unnecessary permutations.

    In the last step you enumerate the directory names and check if there's a match in this HashSet of all possible names. When you've found a matching directory you just need to search for all .pptx files in this directory. If necessary just replace the "*.pptx" with your file name pattern.