Search code examples
machine-learningpytorchtensortorch

genereate unique row index in a 2D tensor as an output 1D tensor with PyTorch


When I implement target in in-batch multi-class classification on PyTorch (version 1.6), I have the following problem.

I got a variable D <class 'torch.Tensor'> (related to label description) of size as torch.Size([16, 128]), i.e. [data_size,token_id_size].

The original idea was to generate a target tensor of torch.Size([16]), each value is unique, corresponding to the rows in D, from 0 to 16 as [0,1,2,...,15], for in-batch multi-class classification.

This can be done using target = torch.LongTensor(torch.arange(16))

But there maybe repeated, non-unique rows in D, so I would like that the same, unique row in D has the its unique index in target. For example D has row0, row1, row8 the same token_ids or vector and the other rows are all different from each other, then target should be [0,0,2,3,4,5,6,0,8,9,10,11,12,13,14,15] or [0,0,1,2,3,4,5,0,6,7,8,9,10,11,12,13], wher the former has still indexes 0-15 (but no 1 and 7) and the latter has indexes of all in 0-13.

How can I implement this?


Solution

  • See answers of the simplified question (i) generate 1D tensor as unique index of rows of an 2D tensor and (ii) generate 1D tensor as unique index of rows of an 2D tensor (keeping the order and the original index), which address the problem of this question.

    But these seem not useful to improve the contrastive multi-class classification.