I want to make one hot encoding of a data set which looks like [[5,7,11,9,13,1,...],[3,7,5,9,16,....],..]; where length of each sequence is 24 and maximum possible integer in each sequence is 33 and the total number of sequences is 200. Each sequence is an integer representation of a sentence. How i can make efficient one hot encoding of this?? I have tried
for sentence in sentences:
n=maxlen
k=max_vocabullary
m=np.zeros((n,k))
m[np.arange(n),sentence]=1
print (m)
Try Scikit-learn's OneHotEncoder.
from sklearn.preprocessing import OneHotEncoder
enc = OneHotEncoder()
encoded_seqs = enc.fit_transform([[5,7,11,9,13,1,...],[3,7,5,9,16,....],..])
http://scikit-learn.org/stable/modules/generated/sklearn.preprocessing.OneHotEncoder.html