So, I'll be storing millions of sentences in a database each with an author. I need to be able to efficiently search for a sentence and return the author. Now, I'd like to be able to mispell a word or forget a word or two in this sentence, and have the application still be able to match (fuzzy-esque). Can anyone point me in the right direction? How does google do this? Because I can search for lyrics on google for instance and it will return the song with the lyrics? I'm looking to do the same thing?
Thanks all.
If fuzzy makes things too complicated, then I can deal with just an efficient sentence search.
For full text search check inverted index data structure.
This is how search engines do it
UPDATE: also if you're working on a distributed system check Hadoop - open source alternative for Goolge's MapReduce