Search code examples
vb.netmultithreadingtaskscheduled-tasks

I need help creating a TaskScheduler to prevent threading overload


I want to add workers into a queue, but only have the first N workers processing in parallel. All samples I find are in C#.

This is probably simple for a programmer, but I'm not one. I know enough about VB to write simple programs.

But my first application runs fine until it suddenly hits 100% CPU and then crashes. Help, please (Yes, I've wasted 5 hours of work time searching before posting this...)

More Context: Performing a recursive inventory of directory structures, files, and permissions across file servers with over 1 million directories/subdirectories.

Process runs serially, but will take months to complete. Management already breathing on my neck. When I try using Tasks, it goes to about 1000 threads, then hits 100% CPU, stops responding, then crashes. This is on a 16 core server with 112 GB RAM.

With the sample provided on using Semaphores, this is what I've put in:

Public Class InvDir
    Private mSm as Semaphore
    Public Sub New(ByVal maxPrc As Integer)
        mSm = New Semaphore(maxPrc, maxPrc)
    End Sub

    Public Sub GetInventory(ByVal Path As String, ByRef Totals As Object, ByRef MyData As Object)
        mSm.WaitOne()
    
        Task.Factory.StartNew(Sub()
                Dim CurDir As New IO.DirectoryInfo(Path)
                Totals.SubDirectoryCount += CurDir.GetDirectories().Count
                Totals.FilesCount += CurDir.GetFiles().Count
                For Each CurFile As IO.FileInfo in CurDir.EnumerateFiles()
                    MyData.AddFile(CurFile.FileName, CurFile.Extension, CurFile.FullName, CurFile.Length)
                Next
                End Sub).ContinueWith(Function(x) mSm.Release())
    End Sub
End Class

Solution

  • You're attempting multithreading with disk I/O. It might be getting slower because you're throwing more threads at it. No matter how many threads there are, the disk can physically only seek one position at a time. (In fact, you mentioned that it works serially.)

    If you did want to limit the number of concurrent threads you could use a Semaphore. A semaphore is like a syncLock except you can specify how many threads are allowed to execute the code at a time. In the example below, the semaphore allows three threads to execute. Any more than that have to wait until one finishes. Some modified code from the MSDN page:

    Public Class Example
    
        ' A semaphore that simulates a limited resource pool.
        '
        Private Shared _pool As Semaphore
    
        <MTAThread> _
        Public Shared Sub Main()
            ' Create a semaphore that can satisfy up to three
            ' concurrent requests. Use an initial count of zero,
            ' so that the entire semaphore count is initially
            ' owned by the main program thread.
            '
            _pool = New Semaphore(0, 3)          
    
        End Sub
    
        Private Sub SomeWorkerMethod()
            'This is the method that would be called using a Task.
            _pool.WaitOne()
            Try
                'Do whatever
            Finally
                _pool.Release()
            End Try
        End Sub
    End Class
    

    Every new thread must call _pool.WaitOne(). That tells it to wait its turn until there are fewer than three threads executing. Every thread blocks until the semaphore allows it to pass.

    Every thread must also call _pool.Release() to let the semaphore know that it can allow the next waiting thread to begin. That's important, even if there's an exception. If threads don't call Release() then the semaphore will just block them forever.

    If it's really going to take five months, what about cloning the drive and running the check on multiple instances of the same drive, each looking at different sections?