Search code examples
javadata-structureshashmapsentence

Choosing a data structure to store the words in a sentence and their starting position


I'm preparing for an interview and one of the questions that tends to come up is this:

Presented with a sentence (e.g, the song is the best song) broken down into words and the indices of the first letter of the word, i.e. "the" - 0, 12; "song" - 4,21; "is" - 9; "best" - 16; choose a data structure to store this information and, using that data structure, reconstruct the sentence.

My initial attempt involves storing the words in a hashmap where the key is the word and the value is an array of positions. This is totally doable but gets quite complicated with nested for loops and annoying problems at border indices, reading in spaces at appropriate locations etc.

I have the code done for it, so if anyone would like to look I'll post (it's long and makes for a riveting read!!)

Anyway, to my question: can anyone suggest a more efficient way of representing and reconstructing the data? I'd love to try another way but this is all I've come up with so far


Solution

  • As one who interviews candidates of varying skill levels, I would want the interviewee to ask more questions before deciding on a final data structure.

    • Will the data be used exclusively to reconstruct the sentence? If so, a list would be preferable.
    • Do you need to be able to look up word positions? If so, your structure is fine.
    • What other questions might you ask about the sentence using this data?

    One option is to create a WordPosition object for each individual word that contains the word, its position, and a reference to the next word. These would form a linked list that makes reconstructing the sentence a trivial in-order traversal. Store these as you have in a map with the words as the keys and a list of WordPositions for each word.