I'm currently working on a desktop tool in .NET Framework 4.8 that takes in a list of images with potential cracks and uses a model trained with ML.Net (C#) to perform crack detection. Ideally, I'd like the prediction to take less than 100ms on 10 images (Note: a single image prediction takes between 36-41ms).
At first, I tried performing multiple predictions in different threads using a list of PredictionEngines and a Parallel.For-loop (using a list of threads since there is no PredictionEnginePool implementation for .Net Framework). I later learned that using an ITransformer to do predictions is a recommended, thread-safe, approach for .Net Framework and moved to using that, but in both cases it did not give me the performance I was hoping for.
It takes around 255-281ms (267.1ms on average) to execute the following code:
MLContext mlContext = new MLContext();
IDataView inputData = mlContext.Data.LoadFromEnumerable(inputDataEnumerable);
IDataView results = _LoadedModel.Transform(inputData);
var imageClassificationPredictions = mlContext.Data.CreateEnumerable<ImageClassificationPrediction>(results, false).ToList();
Where _LoadedModel is an ITransformer representing the previously trained and loaded model, and inputDataEnumerable is a list of ModelInput which contains two properties: ImageData (byte[] of image data extracted from a png image) and Label (string type, set to null).
I tried to speed up this process by switching the TensorFlow package dependency from SciSharp.TensorFlow.Redist
to
SciSharp.TensorFlow.Redist-Windows-GPU
as described in this tutorial.
However, the execution time remained pretty much the same (average of 262.4ms for 10 images). I also tried comparing the training times on a small data set of 5760 images and couldn't see much of a difference (both took about 7min 21s).
From these results it seemed like it wasn't using the GPU, so I first tried deleting the bin folders of my projects and removing the old CPU-oriented tensorflow package (in case it was a simple build issue). When that didn't help, I reinstalled CUDA 10.0, following instructions described here. I also double checked that CUDA was working properly with my graphics card by running a few of the sample projects (DeviceQuery, DeviceQueryDrv, and bandwidthTest) just to be sure the card is actually compatible, and those ran just fine.
At this point it seems like I've set something up wrong or the GPU is just not applicable for my particular use case, but I can't pin-point which it is. According to the tutorial I was following, GPU acceleration should be available for predictions, but I'm not seeing any significant differences in execution time after trying to use the GPU.
If anyone has any suggests for further troubleshooting steps I can take, or if they have an idea about where I went wrong, or if they think this is the wrong use case, I'd greatly appreciate any help/feedback.
If it helps, here are some system specs:
here are the ML.Packages (Version) I'm running:
and for GPU support I've installed CUDA v10.0 along with CUDNN v7.6.4.
Edit
The issue turned out to not be ML.Net specific, but rather related to TensorFlow.Net. After I updated the SciSharp.TensorFlow.Redist-Windows-GPU to version 2.3.0 (released 8/31/2020), I updated CUDA to 10.1, and followed guidance from the TensorFlow.Net GitHub which had some slightly different steps for getting GPU support to work. I can now get the 10 predictions in less than 50ms which is even better than my target.
It's likely a version mismatch.
TensorFlow supports CUDA® 10.1 (TensorFlow >= 2.1.0)
https://www.tensorflow.org/install/gpu
You can check your output window for reasons why it would not be connecting to your GPU.