Search code examples
image-processingarmhistogramsimdneon

Methods to vectorise histogram in SIMD?


I am trying to implement histogram in Neon. Is it possible to vectorise ?


Solution

  • Histogramming is almost impossible to vectorize, unfortunately.

    You can probably optimise the scalar code somewhat however - a common trick is to use two histograms and then combine them at the end. This allows you to overlap loads/increments/stores and thereby bury some of the serial dependencies and associated latencies. Pseudo code:

    init histogram 1 to all 0s
    init histogram 2 to all 0s
    loop
      get input value 1
      get input value 2
      load count for value 1 from histogram 1
      load count for value 2 from histogram 2
      increment count for histogram 1
      increment count for histogram 2
      store count for value 1 to histogram 1
      store count for value 2 to histogram 2
    until done
    combine histogram 1 and histogram 2