Search code examples
pythonpython-3.xdeep-learningkeras

Vectorize Sequences explanation


Studying Deep Learning with Python, I can't comprehend the following simple batch of code which encodes the integer sequences into a binary matrix.

def vectorize_sequences(sequences, dimension=10000):
    # Create an all-zero matrix of shape (len(sequences), dimension)
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
       results[i, sequence] = 1.  # set specific indices of results[i] to 1s
    return results

(train_data, train_labels), (test_data, test_labels) = imdb.load_data(num_words=10000)

x_train = vectorize_sequences(train_data)

And the output of x_train is something like

x_train[0] array([ 0., 1.,1., ...,0.,0.,0.])

Can someone put some light of the 0.'s existance in x_train array while only 1.'s are appending in each next i iteration? I mean shouldn't be all 1's?


Solution

  • The for loop here is not processing all the matrix. As you can see, it enumerates elements of the sequence, so it's looping only on one dimension. Let's take a simple example :

    t = np.array([1,2,3,4,5,6,7,8,9]) r = np.zeros((len(t), 10))

    Output

    array([[0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]])
    

    then we modify elements with the same way you have :

    for i, s in enumerate(t): r[i,s] = 1.

    array([[0., 1., 0., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 1., 0., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 1., 0., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 1., 0., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 1., 0., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 1., 0., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 1., 0., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 1., 0.],
       [0., 0., 0., 0., 0., 0., 0., 0., 0., 1.]])
    

    you can see that the for loop modified only a set of elements (len(t)) which has index [i,s] (in this case ; (0, 1), (1, 2), (2, 3), an so on))