Search code examples
numpymachine-learningtrain-test-split

Facing an IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices


I have been working on link prediction problem in which the data set, which is a numpy array, has to be parsed and stored into another numpy array. I am trying to do the same but at 9th line it is throwing an IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices. I even tried typecasting the indices with int but it seems to not work. What am I missing here ?



    1. train_edges, test_edges, = train_test_split(edgeL,test_size=0.3,random_state=16)   
       
    2. out_dim = int(W_out.shape[1])
    
    3. in_dim = int(W_in.shape[1])
    
    4. train_x = np.zeros((len(train_edges), (out_dim + in_dim) * 2))
    
    5. train_y = np.zeros((len(train_edges), 1))
    
    6. for i, edge in enumerate(train_edges):
    
    7.     u = edge[0]
    
    8.     v = edge[1]
    
    9.     train_x[int(i), : int(out_dim)] = W_out[u]
    
    10.    train_x[int(i), int(out_dim): int(out_dim + in_dim)] = W_in[u]
    
    11.    train_x[i, out_dim + in_dim: out_dim * 2 + in_dim] = W_out[v]
    
    12.    train_x[i, out_dim * 2 + in_dim:] = W_in[v]
    
    13.    if edge[2] > 0:
    
    14.        train_y[i] = 1
    
    15.    else:
    
    16.        train_y[i] = -1

EDIT:

For reference, The W_out is a 64-dimensional tuple which looks like this

print(W_out[0])
type(W_out.shape[1])

Output:
[[0.10160154 0.         0.70414263 0.6772633  0.07685234 0.75205046
  0.421092   0.1776721  0.8622188  0.15669271 0.         0.40653425
  0.5768579  0.75861764 0.6745151  0.37883565 0.18074909 0.73928916
  0.6289512  0.         0.33160248 0.7441727  0.         0.8810399
  0.1110919  0.53732747 0.         0.33330196 0.36220717 0.298112
  0.10643011 0.8997948  0.53510064 0.6845873  0.03440218 0.23005858
  0.8097505  0.7108275  0.38826624 0.28532124 0.37821335 0.3566149
  0.42527163 0.71940386 0.8075657  0.5775364  0.01444144 0.21734199
  0.47439903 0.21176265 0.32279345 0.00187511 0.43511534 0.4302601
  0.39407462 0.20941389 0.199842   0.8710182  0.2160332  0.30246672
  0.27159846 0.19009161 0.32349357 0.08938174]]
int

And edge is a tuple which is from training data set which has source, destination, sign. It looks like this...

train_edges, test_edges, = train_test_split(edgeL,test_size=0.3,random_state=16)

for i, edge in enumerate(train_edges):
  print(edge)
  print(i)
  type(i)
  type(edge)

Output:
    Streaming output truncated to the last 5000 lines.
2936
['16936', '17031', '1']
2937
['15307', '14904', '1']
2938
['22852', '13045', '1']
2939
['14291', '96703', '1']
2940

Any help/suggestion is highly appreciated.


Solution

  • Your syntax is causing the error.

    Looks like accessing the edge object may be the issue. Debug using type() and len() of edge and see what the index error is.

    implicitly specifying int(i) is not needed, so the issue will be in the assignment of train_index[x] or your enumeration logic is not right.