I have a question about calling the values from the list of tensors with multiple indices.
Although I think that there are similar questions such as here, I couldn't completely use it.
I have a dataset comprising the 4-dimensional features for about 108,000 nodes and their links.
tmp = []
for _ in range(4):
tmp.append(torch.rand((107940, 4), dtype=torch.float).to(device))
tmp
# [tensor([[0.9249, 0.5367, 0.5161, 0.6898],
# [0.2189, 0.5593, 0.8087, 0.9893],
# [0.4344, 0.1507, 0.4631, 0.7680],
# ...,
# [0.7262, 0.0339, 0.9483, 0.2802],
# [0.8652, 0.3117, 0.8613, 0.6062],
# [0.5434, 0.9583, 0.3032, 0.3919]], device='cuda:0'),
# tensor([...], device='cuda:0'),
# tensor([...], device='cuda:0'),
# tensor([...], device='cuda:0')]
# batch.xxx: factors in the batch from the graph
# Note that batch.edge_index[0] is the target node and batch.edge_index[1] is the source node.
# If you need more information, please see the Pytorch Geometric data format.
print(batch.n_id[batch.edge_index])
print(batch.edge_index_class)
#tensor([[10231, 3059, 32075, 10184, 1187, 6029, 10134, 10173, 6521, 9400,
# 14942, 31065, 10087, 10156, 10158, 26377, 85009, 918, 4542, 10176,
# 10180, 6334, 10245, 10228, 2339, 7891, 10214, 10240, 10041, 10020,
# 7610, 10324, 4320, 5951, 9078, 9709],
# [ 1624, 1624, 6466, 6466, 6779, 6779, 7691, 7691, 8655, 8655,
# 30347, 30347, 32962, 32962, 34435, 34435, 3059, 3059, 32075, 32075,
# 1187, 1187, 6029, 6029, 10173, 10173, 6521, 6521, 9400, 9400,
# 31065, 31065, 10087, 10087, 10158, 10158]], device='cuda:0')
#tensor([3., 3., 2., 2., 0., 0., 3., 3., 2., 2., 0., 0., 2., 2., 2., 2., 3., 3.,
# 2., 2., 0., 0., 0., 0., 3., 3., 2., 2., 2., 2., 0., 0., 2., 2., 2., 2.],
# device='cuda:0')
In this case, I want the new tensor that contains the feature values matched to the edge_index_class.
For example, tmp_filled
will have the 1624, 10231, and 3059th values from the fourth dataset in tmp
because they are labeled with edge_index_class
as 3.
Similarly, 6466, 32075, and 10184th values in the third dataset in tmp
will go into the same index in tmp_filled
.
To do this, I tried the code as below:
for k in range(len(batch.edge_index_class)):
tmp_filled[batch.n_id[torch.unique(batch.edge_index)]] = tmp[int(batch.edge_index_class[k].item())][batch.n_id[torch.unique(batch.edge_index)]]
tmp_filled
# tensor([[0., 0., 0., 0.],
# [0., 0., 0., 0.],
# [0., 0., 0., 0.],
# ...,
# [0., 0., 0., 0.],
# [0., 0., 0., 0.],
# [0., 0., 0., 0.]], device='cuda:0')
But it returned the wrong result.
tmp_filled[1624]
# tensor([0.3438, 0.5555, 0.6229, 0.7983], device='cuda:0')
tmp[3][1624]
# tensor([0.6895, 0.3241, 0.1909, 0.1635], device='cuda:0')
When I need the tmp_filled
data to consist of (107940 x 4) format, how should I correct my code?
Thank you for reading my question!
The below code resulted in what I want. But if anyone has a more efficient solution, please feel free to answer.
for edge_index_class in torch.unique(batch.edge_index_class):
# Find indices where edge_index_class matches
indices = (batch.edge_index_class == edge_index_class).nonzero(as_tuple=True)[0]
# Extract corresponding edge_index and n_id
# edge_index = batch.edge_index[:, indices]
n_id = torch.unique(batch.n_id[batch.edge_index[:, indices]])
tmp_filled[n_id] = tmp[int(edge_index_class.item())][n_id]
tmp_filled[1624]
# tensor([0.6071, 0.9668, 0.9829, 0.1886], device='cuda:0')
tmp[3][1624]
# tensor([0.6071, 0.9668, 0.9829, 0.1886], device='cuda:0')