I am trying to implement histogram in Neon. Is it possible to vectorise ?
Histogramming is almost impossible to vectorize, unfortunately.
You can probably optimise the scalar code somewhat however - a common trick is to use two histograms and then combine them at the end. This allows you to overlap loads/increments/stores and thereby bury some of the serial dependencies and associated latencies. Pseudo code:
init histogram 1 to all 0s
init histogram 2 to all 0s
loop
get input value 1
get input value 2
load count for value 1 from histogram 1
load count for value 2 from histogram 2
increment count for histogram 1
increment count for histogram 2
store count for value 1 to histogram 1
store count for value 2 to histogram 2
until done
combine histogram 1 and histogram 2