Search code examples
c#algorithmlambdaexpressionlinq-query-syntax

Algorithm to create batches of Files


I have one directory where all files with their different version are available. Like,

ABC.pdf ABC_1.pdf .......

XYZ.tif ..... XYZ_25.tif

MNO.tiff

I want to make n batches of m files as per used requirement.

Suppose, in folder I have ABC.pdf to ABC_24.pdf & XYZ.tif to XYZ_24.tif files. Total 50 files. I want to create two batches of 25 files each. So, first (do I/how to) need to make sure that all files in my list are sorted then I can perform some logic to divide the list into two proper batches.

1) ABC.pdf to ABC_24.pdf

2) XYZ.tif to XYZ_24.tif

But if I have 26 files (as described in beginning) then it would be like

1) ABC.pdf to ABC_24.pdf

2) XYZ.tif to XYZ_24.tif

3) ABC_25.pdf and XYZ_25.tif

So, I want proper/meaningful batch allocation of files here. I would prefer to perform in as less no of lines as possible. So, I tried lambda expression as below :

List<string> strIPFiles =  Directory.GetFiles(folderPath, "*.*").
Where(file => file.ToLower().EndsWith("tiff") || file.ToLower().EndsWith("tif") || file.ToLower().EndsWith("pdf")).ToList();

int batches = 2, filesPerBatch = 25; //for example

Do I need to use - strIPFiles.Sort(); will it be useful in anyway or I will always get sorted list of files?

How to create batches from the list - using lambda expression as I expressed above?

Thanks for your help.


Solution

  • Not sure if I entirely understand your question. I assume you are looking for a way to divide files into batches of specified size ( as in # of files) and you also want them to group based on file name.

    Let me know if this is helpful:

        public static void CreateBatch(int batchSize)
        {
            string sourcePath = @"C:\Users\hari\Desktop\test";
    
            var pdfs = Directory.EnumerateFiles(sourcePath, "*.pdf", SearchOption.TopDirectoryOnly);
            var tiffs = Directory.EnumerateFiles(sourcePath, "*.tiff", SearchOption.TopDirectoryOnly);
    
            var images = pdfs.Union(tiffs);
    
            var imageGroups = from image in images
                              group image by Regex.Replace(Path.GetFileNameWithoutExtension(image), @"_\d+$", "") into g
                              select new { GroupName = g.Key, Files = g.OrderBy(s => s) };
    
            List<List<string>> batches = new List<List<string>>();
            List<string> batch = new List<string>();
    
            foreach (var group in imageGroups)
            {
                batch = batch.Union(group.Files).ToList<string>();
    
                if (batch.Count >= batchSize)
                {
                    batches.Add(batch);
                    batch = new List<string>();
                }
            }            
        }