If I have to use pretrained word vectors as embedding layer in Neural Networks (eg. say CNN), How do I deal with index 0?
Detail:
We usually start with creating a zero numpy 2D array. Later we fill in the indices of words from the vocabulary. The problem is, 0 is already the index of another word in our vocabulary (say, 'i' is index at 0). Hence, we are basically initializing the whole matrix filled with 'i' instead of empty words. So, how do we deal with padding all the sentences of equal length?
One easy pop-up in mind is we can use the another digit=numberOfWordsInVocab+1 to pad. But wouldn't that take more size? [Help me!]
One easy pop-up in mind is we can use the another digit=numberOfWordsInVocab+1 to pad. But wouldn't that take more size?
Nope! That's the same size.
a=np.full((5000,5000), 7)
a.nbytes
200000000
b=np.zeros((5000,5000))
b.nbytes
200000000
Edit: Typo