Search code examples
tensorflowdeep-learningword-embedding

How does Word Embeddings in Deep Learning works?


I have a very basic doubt in Word Embeddings. I have an understanding that word embeddings are used to represent text data in a numeric format without losing the context, which is very helpful in training deep models.

Now my question is, does the word embedding algorithm need to learn all the data once and then represent each record in numeric format? Or else, each record will be represented individually with knowing what other records.

Tensorflow code:

enter image description here

This is an experiment I did with sample code where embeddings independently reframe the data into the specified dimension.

Is my understanding correct?


Solution

  • No it doesnt need to learn all the data once and then represent each record in numeric format , it is done individually . What you did is correct , but there is much methods for Natural Language Processing , i can recommand you a good method too , is to transform each letter to a number , so here you can use the prediction letter by letter , is it true that it wont be fast but it can garantee a good accuracy because hte vocabulary of letters is less than the word's , it can be something like this :

    vocab = set( your_text ) # extract each distinct letter
    vocab_to_int = {l:i for i,l in enumerate(vocab)} # transforms letter to number
    int_to_vocab = {i:l for i,l in enumerate(vocab)} # do the inverse
    
    transformed_text = [vocab_to_int[l] for l in your_text] # all text transformed