Processing of integers on different CPUs

My task is to design a function that fulfils those requirements:

Function shall sum members of given one-dimensional array. However, it should sum only members whose number of ones in the binary representation is higher than defined threshold (e.g. if the threshold is 4, number 255 will be counted and 15 will not)
The array length is arbitrary
The function shall utilize as little memory as possible and shall be written in an efficient way
The production function code (‘sum_filtered(){..}’) shall not use any standard C library functions (or any other libraries)
The function shall return 0 on success and error code on error
The array elements are of a type 16-bit signed integer and an overflow during calculation shall be regarded as a failure
Use data types that ensure portability between different CPUs (so the calculations will be the same on 8/16/32-bit MCU)
The function code should contain a reasonable amount of comments in doxygen annotation

Here is my solution:

#include <iostream>
using namespace std;

int sum_filtered(short array[], int treshold)
{
    // return 1 if invalid input parameters
    if((treshold < 0) || (treshold > 16)){return(1);}

    int sum = 0;
    int bitcnt = 0;
    for(int i=0; i < sizeof(array); i++)
    {
        // Count one bits of integer
        bitcnt = 0;
        for (int pos = 0 ; pos < 16 ; pos++) {if (array[i] & (1 << pos)) {bitcnt++;}}

        // Add integer to sum if bitcnt>treshold
        if(bitcnt>treshold){sum += array[i];}
    }
    return(0);
}

int main()
{
 short array[5] = {15, 2652, 14, 1562, -115324};
 int result = sum_filtered(array, 14);
 cout << result << endl;

 short array2[5] = {15, 2652, 14, 1562, 15324};
 result = sum_filtered(array2, -2);
 cout << result << endl;
}

However I'm not sure whether this code is portable between different CPUs.

And I don't how can an overflow occur during calculation and what can be other errors during processing of arrays with this function.

Can somebody more experienced give me his opinion?

Solution

Well, I can foresee one problem:

for(int i=0; i < sizeof(array); i++)

array in this context is a pointer, so will likely be 4 on 32bit systems, or 8 on 64bit systems. You really do want to be passing a count variable (in this case 5) into the sum_filtered function (and then you can pass the count as sizeof(array) / sizeof(short)).

Anyhow, this code:

    // Count one bits of integer
    bitcnt = 0;
    for (int pos = 0 ; pos < 16 ; pos++) {if (array[i] & (1 << pos)) {bitcnt++;}}

Effectively you are doing a popcount here (which can be done using __builtin_popcount on gcc/clang, or __popcnt on MSVC. They are compiler specific, but usually boil down to a single popcount CPU instruction on most CPUs).

If you do want to do this the slow way, then an efficient approach is to treat the computation as a form of bitwise SIMD operation:

#include <cstdint> // or stdint.h if you have a rubbish compiler :)

uint16_t popcount(uint16_t s)
{
  // perform 8x 1bit adds
  uint16_t a0 =  s & 0x5555;
  uint16_t b0 = (s >> 1) & 0x5555;
  uint16_t s0 = a0 + b0;
  // perform 4x 2bit adds
  uint16_t a1 =  s0 & 0x3333;
  uint16_t b1 = (s0 >> 2) & 0x3333;
  uint16_t s1 = a1 + b1;
  // perform 2x 4bit adds
  uint16_t a2 =  s1 & 0x0F0F;
  uint16_t b2 = (s1 >> 4) & 0x0F0F;
  uint16_t s2 = a2 + b2;
  // perform 1x 8bit adds
  uint16_t a3 =  s2 & 0x00FF;
  uint16_t b3 = (s2 >> 8) & 0x00FF;
  return a3 + b3;
}

I know it says you can't use stdlib functions (your 4th point), but that shouldn't apply to the standardised integer types surely? (e.g. uint16_t) If it does, well then there is no way to guarantee portability across platforms. You're out of luck.

Personally I'd just use a 64bit integer for the sum. That should reduce the risk of any overflows *(i.e. if the threshold is zero, and all the values are -128, then you'd overflow if the array size exceeded 0x1FFFFFFFFFFFF elements (562,949,953,421,311 in decimal).

#include <cstdint>

int64_t sum_filtered(int16_t array[], uint16_t threshold, size_t array_length)
{
  // changing the type on threshold to be unsigned means we don't need to test
  // for negative numbers. 
  if(threshold > 16) { return 1; }

  int64_t sum = 0;
  for(size_t i=0; i < array_length; i++)
  {
    if (popcount(array[i]) > threshold) 
    {
      sum += array[i];
    }
  }
  return sum;
}