c++c parallel-processing vectorization scilab

Vectorized/vectorizing functions in C

For me, one of the most interesting features in languages such as R or Scilab is the possibility of parallelizing operations by vectorizing functions ("meaning that the function will operate on all elements of a vector without needing to loop through and act on each element one at a time", in the words of The Carpentries). This is supposed to make the code clearer and faster to execute.

My question is: Is this a possibility in C or C++? Can we create functions in C that can operate either on a scalar or a vector? Can we use standard C functions as if they were vectorized?

Maybe C is so fast that you don't need this feature, but I want to be sure about this subject, since this would affect the way I translate algorithms into code.

To be more concrete, if I want to apply a function on each element of a vector in C, should I use a loop, or there are other alternatives?

Solution

In c (prior to c11), a given "function call" cannot be overloaded. If you want a function that operates on a vector or a function that operates on an element, those functions should have different names.

With c11, _Generic and macros let you dispatch based on argument type. See this SO answer. That would permit sin(x) to do a scalar operation if x was a double, or a vector operation if x was not.

In c++ functions can be overloaded. The same function (or operation) can do scalar operations on single elements and vector operations on multiple elements. You can also store results in auto type variables, so you can be agnostic to the return type.

Writing the glue code to convert a scalar operation into a vector one still has to be done somewhere, and C++ has only limited ability to automate writing that glue code.

Now, you could write c style tagged unions that could contain either vectors or scalars and have the code that operates on them dynamically switch between the two modes.

In c++ you could write template code that statically switches between vector and scalar implementations.

Both solutions are not something a beginner in either language would be able to successfully do.

c++ has valarray, which does limited vectorization for you, but it isn't well supported by compilers, nor does it extend well.

Various libraries support efficient vectorization of a limited set of operations; any good matrix library, for example.

Most higher level (than C/C++) languages end up implementing their lower level high speed code in C or C++ or (in some cases) more directly in assembly. Usually C/C++ with assembly or "intrinsics" augmentation is enough to get the most of the performance speedup they want.