Search code examples
algorithmnlpdata-miningtext-mininglatent-semantic-indexing

Latent Semantic Analysis concepts


I've read about using Singular Value Decomposition (SVD) to do Latent Semantic Analysis (LSA) in corpus of texts. I've understood how to do that, also I understand mathematical concepts of SVD.

But I don't understand why does it works applying to corpuses of texts (I believe - there must be linguistical explanation). Could anybody explain me this with linguistic point of view?

Thanks


Solution

  • There is no linguistic interpretation, there is no syntax involved, no handling of equivalence classes, synonyms, homonyms, stemming etc. Neither are any semantics involved, it is just words-occuring-together. Consider a "document" as a shopping cart: it contains a combination of words (purchases). And words tend to occur together with "related" words.

    For instance: The word "drug" can occur together with either of {love, doctor, medicine, sports, crime}; each will point you in a different direction. But combined with many other words in the document, your query will probably find documents from a similar field.