Search code examples
vectorizationsparse-matrixmatrix-multiplicationriscv

Masking individual rows for CSR SpMV in RISC-V RVV 0.7.1


EDIT: I have reformulated my question into something more productive, and will provide an answer below. The old version of this question is below still.

I am implementing an optimized SpMV kernel for the CSR format in RVV 0.7.1. In C, SpMV can be implemented as such:

for (int i = 0; i < numRows; i++)
    for (int j = rowPtr[i]; j < rowPtr[i+1]; j++)
       y[i] += val[j] * x[colIdx[j]];

Vectorizing this is simple initially, by loading coldIdx, using it to perform an indexed load of x, loading val and multiplying the two. After this, a reduction sum (v{f}redsum.vs) must be performed in each row, adding all the products within them and storing them in the respective index of y. The challenge is creating the mask from rowPtr, so that only the elements belonging to a given row are active for each reduction.

My initial thought (which should provide context regarding the original question) was using the indexes in rowPtr to place 1's in the respective elements of a vector register, then using viota and an element shift to create a register containing the row each element belongs to. However, that is apparently not viable.

How do I build the required masks?


Old question

I am trying to use the RISC-V vector extension (0.7.1) to create a mask register based off an array of indices. For example, take the following array of indices, idx array. The goal is to use the indices present in it to put 1's in the respective mask elements, as you see with the mask below:

idx array | 0 2 3 4 4 6 8
elem idx  | 0 1 2 3 4 5 6 7 8
mask      | 1 0 1 1 1 0 1 0 1

At first I was mistakenly looking at the vrgather.vx instruction, but it does not do quite what I want, as it populates all elements. I would like the code to be quite performant as well, so if it's possible for it consist of mostly vector instructions that would be ideal. I would greatly appreciate it if someone could point me in the right direction.


Solution

  • Credit for the solution goes to camel-cdr!

    Take the following rowPtr array:

    0 2 3 5 7

    Lets consider the third iteration ([3,5[), which should be more illustrative. The goal is to arrive at the following mask, enabling elements 3 and 4:

    00011000

    The first step is using vid to write each element's index to a vector. With that, we can use vmseq (set mask if equal) twice, with elements i and i+1 of rowPtr, which will result in two registers like so, with a one in element rowPtr[i] or rowPtr[i+1]:

    idx | 01234567
    i   | 00010000
    i+1 | 00000100
    

    Then we can use vmsbf.m (set-before-first) on both, which sets all bits before the first active mask bit. By XOR'ing the two, we arrive at the final desired mask:

    sbf i   | 11100000
    sbf i+1 | 11111000
    xor     | 00011000
    

    By iterating through rowPtr and repeating this, it is possible to generate a mask for the elements of each individual row. In short, we can write this operation as:

    vmxor(vmsbf(vmseq(vid,rowPtr[i])), vmsbf(vmseq(vid,rowPtr[i+1])))