I've been looking at how to prepare dataset for deep learning models.
If we have a data like this,
data = [['this', 'is'], ['not', 'with']]
first they get the frequency of words in our corpus. Based on a word frequency integer label was assigned to word.
The word which is more frequent got assigned 1, then 2 and so on..
My question is why do we need to do that? Can't we just randomly assigned integer values for words. Does it increase accuracy if we following that rule.
I doubt it has any effect on accuracy, unless maybe you're doing something unusual later on
I could see it having effects on: