Search code examples
pythontensorflowdimensionsdimensionality-reduction

Dimensionality reduction on list of word vectors


I have a set of vectors that represent words and each vector has 300 features meaning that there are 300 floats for each vector. My goal is to reduce to dimensionality i.e. to 50 so that I can gain some space.

How can apply a dimensionality reduction on this vector set using e.g. tensorflow? I couldn't find a method, an implementation etc. that takes a list of vectors as input and reduces it.


Solution

  • You might want to look into convolutional neural networks for text processing. CNNs in general are known for dimensionality reduction of the input vectors. They are usually used for image classification but also work on text and sentence classification. What you are looking for is the embedding of an input vector. Quote:

    Now that our words have been replaced by numbers, we could simply do one-hot encoding but that would result in an extremely wide input — there are thousands of unique words in the titles dataset. A better approach is to reduce the dimensionality of the input — this is done through an embedding layer (see full code here):

    This is from here:

    TowardsDataScience

    Another ressource:

    AnalyticsVidhya