Search code examples
pythonone-hot-encoding

Don't understand how this example one-hot code indexes a numpy array with [i,j] when j is a tuple?


I don't get how the line: results[i, sequence] = 1 works in the following.

I am following along in the debugger with some sample code in a Manning book: "Deep Learning with Python" (Example 3.5-classifying-movie-reviews.ipynb from the book) and while I understand what the code does, I don't get the syntax or how it works. I'm trying to learn Python and Deep learning together and want to understand what the code is doing.

def vectorize_sequences(sequences, dimension=10000):
    # Create an all-zero matrix of shape (len(sequences), dimension)
    results = np.zeros((len(sequences), dimension))
    for i, sequence in enumerate(sequences):
        results[i, sequence] = 1.  # <--- How does this work?
    return results
  • This creates results, a 25,000 x 10,000 array of zeros.
  • sequences is a list-of-tuples, for example (3, 0, 5). It then walks sequences and for each non-zero value of each sequence, set the corresponding index in results[i] to 1.0. They call it one-hot encoding.
  • I don't understand how the line: results[i, sequence] = 1 accomplishes this in numpy.
  • I get the for i, sequence in enumerate(sequences) part: just enumerating the sequences list and keeping track of i.
  • I'm guessing there is some numpy magic that is somehow setting values in results[i] based on examining sequence[n] element by element and inserting a 1.0 whenever sequence[n] is non-zero(?) Just want to understand the syntax.

Solution

  • Assuming sequence is a list of integers,

    results[i,sequence] = 1
    

    is equivalent to

    for j in sequence:
        results[i][j] = 1