Let's say I have a corpus of documents which I want to read one by one and store them in a data structure. The structure will probably be a list of something. That something class will define a single document. Inside that class I'll have to use a data structure to store the contents from each document, what that should be? Also, if I want to count occurrences of words and retrieve the most frequent words in each document, will I have to use a data structure that will allow me to do this in time < O(n) that would take to examine all the contents sequentially?
Use an associative array, also called map or dictionary since different programming languages use different terms for the same data structure.
Every entry key would be a word and the counter would be the value of the entry. For example
{
'on' -> 15,
'and' -> 43,
'I' -> 157,
'confluence' -> 1,
'dear' -> 2
}