Writting a C++ Dll and exposing it efficiently to .NET (not through C++/CLI)

I am building a C# project. This project is going to use NVidia's Tesla through CUDA. CUDA C native implementation is not exposed directly to C# and, in my opinion, the available C# wrappers (like Brahma, CUDAfy, Linq to GPU) are not mature enough for production.

I decided to go ahead and build my math logic in a C++ component that is going to access CUDA which is the official supported way. C++/CLI is not an option as I am using Intel C++ Compiler, for performance, which doesn't support CLR extensions.

My most important criteria is performance, so, I would try to minimise marshalling and copying arrays between C++ (where my business logic lives) and .NET (the rest of my applications).

I am aware that this question has been asked before, but mostly, the C++ library is already there and other times, C++/CLI is an option, but here, both situations are not the case.

Given that I am going to write the C++ library from scratch in C++, I am in the position to decide the best way to expose it to C#. Do you have any recommendations or best practices that I should follow to get the easiest and highest performing integration between C++ and .NET? Note that what I will be exchanging are mostly large arrays

Edit: clarifying that I am building my business logic (math) in C++ and not an infrastructure library to facilitate access to GPU.

Solution

While it is certainly possible to outperform the already existing libraries that you deemed not mature enough, the very fact that you are asking this question here should make you think twice about deciding to roll your own library/implementation!

Considerations beyond specific performance such as stability and reliability should be your primary concern if this is going to production. Generally unless you know what you're doing, duplicating the effort of the community, or other teams of developers, can be a slippery slope.

I know this answer doesn't really address your question but as it's formulated your question is in my opinion overly broad and there is no simple answer. Initially I was going to post this as a comment but decided it was too long to fit the format.

So, in closing, I recommend you try out the already existing libraries and if you find them not-fitting from a performance stand point, start asking specific questions.

UPDATE

If you're going to implement most of the logic in C++ and you're expecting to just be transferring some results back to your managed code in the form of arrays then there isn't much that you need to do. In general the automatic marshalling of arrays is as efficient as you're going to get.

The one thing I would recommend is though to read as much as possible about Marshalling and use a performance profiler before deciding to get "creative" in order to improve things.

And here's one last idea that might be interesting but again, you should profile before attempting to use this: you might try to use a Memory Mapped File as the backing store for your data and open the file from both ends. Ultimately this may or may not be useful so definitely profile before you buy ;)