I have a very basic doubt in Word Embeddings. I have an understanding that word embeddings are used to represent text data in a numeric format without losing the context, which is very helpful in training deep models.
Now my question is, does the word embedding algorithm need to learn all the data once and then represent each record in numeric format? Or else, each record will be represented individually with knowing what other records.
Tensorflow code:
This is an experiment I did with sample code where embeddings independently reframe the data into the specified dimension.
Is my understanding correct?
No it doesnt need to learn all the data once and then represent each record in numeric format , it is done individually . What you did is correct , but there is much methods for Natural Language Processing , i can recommand you a good method too , is to transform each letter to a number , so here you can use the prediction letter by letter , is it true that it wont be fast but it can garantee a good accuracy because hte vocabulary of letters is less than the word's , it can be something like this :
vocab = set( your_text ) # extract each distinct letter
vocab_to_int = {l:i for i,l in enumerate(vocab)} # transforms letter to number
int_to_vocab = {i:l for i,l in enumerate(vocab)} # do the inverse
transformed_text = [vocab_to_int[l] for l in your_text] # all text transformed