I am working on rewriting a python module originally written in C using python-C api to Cython.The module also uses NumPy. A major challenge of the project is to maintain the current speed of module and also it should work for all Numpy data types. I am thinking to use fused data type to make it generic but I am worried because of its bottleneck effect on performance. Are there any other technique that can be used instead of fused type which I can use to achieve both speed and generic code.
Ignoring ali_m's perfectly valid comment about whether you've actually measured your performance issues...
http://docs.cython.org/src/userguide/fusedtypes.html#selecting-specializations
"For a cdef or cpdef function called from Cython this means that the specialization is figured out at compile time. For def functions the arguments are typechecked at runtime, and a best-effort approach is performed to figure out which specialization is needed."
Essentially, if you're calling from Cython there should be no issue - separate functions are generated and used without overhead. If you're calling from Python it obviously has to stop and think about which one to call.
But measure your performance before worrying about it! (And read the manual, which answers your question quite clearly.)