Search code examples
c#recursionforeachrefparallel.foreach

Parallel.ForEach and foreach returning different reference vars


(EDIT: If the title is confusing I'm all ears for a better one)

I'm currently working on a small project for class in C#, and I've come into something strange. The purpose of the project is to count all folders, files, and the size of files in a given directory, both in foreach and Parallel.ForEach.

Initially I was making it a recursive return function, but was having issues with the base case given that it had to account for all possible code paths with returns. I then switched to using ref params. Below I have current code for the methods.

/*
* Calculate total directories, count of files, and size of all files from
* a given path using a singular foreach. Update passed reference parameters.
*/
static void singRecurse(DirectoryInfo di, ref int countFolder, ref int countFile,
    ref long countByte)
{
    try{
        DirectoryInfo[] directories = di.GetDirectories();
        foreach(DirectoryInfo d in directories){
            countFolder += 1;
            foreach(FileInfo f in d.GetFiles()){
                countFile += 1;
                countByte += f.Length;
            }
            singRecurse(d, ref countFolder, ref countFile, ref countByte);
        }
    } catch (UnauthorizedAccessException){
        Console.WriteLine("You do not have access to this directory");
    }
}

/*
* Calculate total directories, count of files, and size of all files from
* a given path using a parallel foreach. Update passed reference parameters.
*/
static void parRecurse(DirectoryInfo di, ref int countFolder, ref int countFile,
    ref long countByte)
{
    int countFolderinLambda = countFolder;
    int countFileinLambda = countFile;
    long countByteinLambda = countByte;

    try{
        DirectoryInfo[] directories = di.GetDirectories();
        Parallel.ForEach(directories, d => {
            countFolderinLambda += 1;
            foreach(FileInfo f in d.GetFiles()){
                countFileinLambda += 1;
                countByteinLambda += f.Length;
            }
            parRecurse(d, ref countFolderinLambda, ref countFileinLambda,
                ref countByteinLambda);
        });
    } catch (UnauthorizedAccessException){
        Console.WriteLine("You do not have access to this directory");
    }

    countFolder = countFolderinLambda;
    countFile = countFileinLambda;
    countByte = countByteinLambda;
}

Current output from running both as separate processes results in:

Parallel calculated in 44ms
6 folders, 20 files, 250498 bytes
    
Single calculated in 11ms
8 folders, 25 files, 405153 bytes

Why is there such a discrepancy?


Solution

  • The recursive function is assigning a relative total to the actual total.

    countFolder = countFolderinLambda;
    countFile = countFileinLambda;
    countByte = countByteinLambda;
    

    Consider this function is running in parallel for multiple directories, and they're all assigning their own number to countFile without the knowing anything about other parallel directories.

    For example,

    w/ has 5 files
    w/x/ has 3 files
    w/y/ has 10 files
    w/z/ has 7 files
    
    • The parallel branch for w/x/ is setting the total to 8 files
    • The parallel branch for w/y/ is setting the total to 15 files
    • The parallel branch for w/z/ is setting the total to 12 files

    Each parallel is undoing the work of the previous.

    Instead, make the recursive function count the relative files and then add to the totals instead of reassigning to them. See below - note that I've renamed your variables to relativeX and totalX for added clarity.

    void parRecurse(DirectoryInfo di, ref int totalFolders, ref int totalFiles, ref long totalBytes)
    {
        int relativeFolders = 0;
        int relativeFiles = 0;
        long relativeBytes = 0;
    
        try
        {
            DirectoryInfo[] directories = di.GetDirectories();
            Parallel.ForEach(directories, d =>
            {
                Interlocked.Increment(ref relativeFolders);
                foreach (FileInfo f in d.GetFiles())
                {
                    Interlocked.Increment(ref relativeFiles);
                    Interlocked.Add(ref relativeBytes, f.Length);
                }
                parRecurse(d, ref relativeFolders, ref relativeFiles, ref relativeBytes);
            });
        }
        catch (UnauthorizedAccessException)
        {
            Console.WriteLine("You do not have access to this directory");
        }
    
        Interlocked.Add(ref totalFolders, relativeFolders);
        Interlocked.Add(ref totalFiles, relativeFiles);
        Interlocked.Add(ref totalBytes, relativeBytes);
    }
    

    As an aside, after writing various recursive functions I tend to gravitate towards creating a container class for the data instead of passing refs. Classes are passed by reference, which simplifies things. Something like this:

    public class FileSystemCountContext
    {
        private int _directories = 0;
        private int _files = 0;
        private long _bytes = 0;
    
        public int Directories => _directories;
        public int Files => _files;
        public long Bytes => _bytes;
    
        public override string ToString()
        {
            return $"{Directories} directories, {Files} files, {Bytes} bytes";
        }
    
        public void IncrementDirectories()
        {
            Interlocked.Increment(ref _directories);
        }
    
        public void IncrementFiles()
        {
            Interlocked.Increment(ref _files);
        }
    
        public void IncrementBytes(long amount)
        {
            Interlocked.Add(ref _bytes, amount);
        }
    }
    
    
    FileSystemCountContext CountFileSystem(DirectoryInfo directory, bool parallel = false, FileSystemCountContext? context = null)
    {
        if (context == null)
        {
            context = new();
        }
    
        foreach (FileInfo fi in directory.GetFiles())
        {
            context.IncrementFiles();
            context.IncrementBytes(fi.Length);
        }
    
        if (parallel)
        {
            Parallel.ForEach(directory.GetDirectories(), di =>
            {
                context.IncrementDirectories();
                CountFileSystem(di, parallel, context);
            });
        }
        else
        {
            foreach (DirectoryInfo di in directory.GetDirectories())
            {
                context.IncrementDirectories();
                CountFileSystem(di, parallel, context);
            }
        }
    
        return context;
    }