Column/row slicing a torch sparse tensor

I have a pytorch sparse tensor that I need sliced row/column wise using this slice [idx][:,idx] where idx is a list of indexes, using the mentioned slice yields my desired result on an ordinary float tensor. Is it possible applying the same slicing on a sparse tensor? Example here:

#constructing sparse matrix
i = np.array([[0,1,2,2],[0,1,2,1]])
v = np.ones(4)
i = torch.from_numpy(i.astype("int64"))
v = torch.from_numpy(v.astype("float32"))
test1 = torch.sparse.FloatTensor(i, v)

#constructing float tensor
test2 = np.array([[1,0,0],[0,1,0],[0,1,1]])
test2 = autograd.Variable(torch.cuda.FloatTensor(test2), requires_grad=False)

#slicing
idx = [1,2]
print(test2[idx][:,idx])

output:

Variable containing:
 1  0
 1  1
[torch.cuda.FloatTensor of size 2x2 (GPU 0)]

I am holding a 250.000 x 250.000 adjacency matrix, where I need to slice n rows and n columns, using the random idx, by simply sampling n random idx's. Since the dataset is so large it is not realistic to convert to a more convenient datatype.

can I achieve the same slicing result on test1? Is it even possible? If not, are there any work-arounds?

Right now I am running my model with the following "hack" of a solution:

idx = sorted(random.sample(range(0, np.shape(test1)[0]), 9000))
test1 = test1AsCsr[idx][:,idx].todense().astype("int32")
test1 = autograd.Variable(torch.cuda.FloatTensor(test1), requires_grad=False)

Where test1AsCsr is my test1 converted to a numpy CSR matrix. This solution works, it is however very slow, and makes my GPU utilization very low, since it needs to read/write from CPU memory, constantly.

Edit: Its fine with a non-sparse tensor as result

Solution

Well it's been a couple of years since there was activity on this question, but better late than never.

This is the function I use for slicing sparse tensors. (Helper functions are below)

def slice_torch_sparse_coo_tensor(t, slices):
    """
    params:
    -------
    t: tensor to slice
    slices: slice for each dimension

    returns:
    --------
    t[slices[0], slices[1], ..., slices[n]]
    """

    t = t.coalesce()
    assert len(args) == len(t.size())
    for i in range(len(args)):
        if type(args[i]) is not torch.Tensor:
            args[i] = torch.tensor(args[i], dtype= torch.long)

    indices = t.indices()
    values = t.values()
    for dim, slice in enumerate(args):
        invert = False
        if t.size(0) * 0.6 < len(slice):
            invert = True
            all_nodes = torch.arange(t.size(0))
            unique, counts = torch.cat([all_nodes, slice]).unique(return_counts=True)
            slice = unique[counts==1]
        if slice.size(0) > 400:
            mask = ainb_wrapper(indices[dim], slice)
        else:
            mask = ainb(indices[dim], slice)
        if invert:
            mask = ~mask
        indices = indices[:, mask]
        values = values[mask]

        
    return torch.sparse_coo_tensor(indices, values, t.size()).coalesce()

Usage (took 2.4s on my machine):

indices = torch.randint(low= 0, high= 200000, size= (2, 1000000))
values = torch.rand(size=(1000000,))
t = torch.sparse_coo_tensor(indices, values, size=(200000, 200000))
idx = torch.arange(1000)
slice_coo(t, [idx, idx])

out:

tensor(indices=tensor([[ 13,  62,  66,  78, 134, 226, 233, 266, 299, 344, 349,
                        349, 369, 396, 421, 531, 614, 619, 658, 687, 769, 792,
                        810, 840, 926, 979],
                       [255, 479, 305, 687, 672, 867, 444, 559, 772,  96, 788,
                        980, 423, 699, 911, 156, 267, 721, 381, 781,  97, 271,
                        840, 292, 487, 185]]),
       values=tensor([0.4260, 0.4816, 0.8001, 0.8815, 0.3971, 0.4914, 0.7068,
                      0.2329, 0.4038, 0.1757, 0.7758, 0.3210, 0.2593, 0.8290,
                      0.1320, 0.4322, 0.7529, 0.8341, 0.8128, 0.4457, 0.4100,
                      0.1618, 0.4097, 0.3088, 0.6942, 0.5620]),
       size=(200000, 200000), nnz=26, layout=torch.sparse_coo)

Timings for slice_torch_sparse_coo_tensor:

%timeit slice_torch_sparse_coo_tensor(t, torch.randperm(200000)[:500], torch.arange(200000))

output:
    1.08 s ± 447 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

for the built-in torch.index_select (implemented here):

%timeit t.index_select(0, torch.arange(100))

output:
    56.7 s ± 4.87 s per loop (mean ± std. dev. of 7 runs, 1 loop each)

These are the helper functions I use for this purpose, the function "ainb" finds the elements of a that are in b. I found this function in the internet a while ago but I can't find the post to link it.

import torch
def ainb(a,b):
    """gets mask for elements of a in b"""

    size = (b.size(0), a.size(0))

    if size[0] == 0: # Prevents error in torch.Tensor.max(dim=0)
        return torch.tensor([False]*a.size(0), dtype= torch.bool)
        
    a = a.expand((size[0], size[1]))
    b = b.expand((size[1], size[0])).T

    mask = a.eq(b).max(dim= 0).values

    return mask

def ainb_wrapper(a, b, splits = .72):
    inds = int(len(a)**splits)

    tmp = [ainb(a[i*inds:(i+1)*inds], b) for i in list(range(inds))]

    return torch.cat(tmp)

Since the function scales quadratically with the amount of elements I added a wrapper that splits the input into chunks and then concatenates the output. It's more efficient using only CPU, but I am not sure whether this holds when using a GPU, would appreciate it if someone could test it :)

It's my first time posting, so feedback on the quality of the post is also appreciated.