Search code examples
algorithmcomputer-sciencetheorybigdata

How does the stackoverflow suggestion works?


What is the theory behind the algorithms, that for example, generate the suggestions on stackoverflow site for similar questions while you write one? Could you recommend some books on the subject?


Solution

  • The algorithms you talk about are found mainly in 3 AI branches: NLP, ML and IR.

    For example to find the most similar 10 questions of a new question one could extract n-grams from the texts of each question, compute TF-IDF weight vectors for each question's n-grams, then compute the cosine similarity between the new question and all the other questions, and choose the 10 questions with the highest similarities.

    Some free books you can read:
    http://nlp.stanford.edu/IR-book/
    http://infolab.stanford.edu/~ullman/mmds.html

    And a 2 free courses starting late January:
    http://www.nlp-class.org/
    http://jan2012.ml-class.org/

    Also (kind of involved):
    http://see.stanford.edu/see/courseinfo.aspx?coll=63480b48-8819-4efd-8412-263f1a472f5a
    http://see.stanford.edu/see/courseinfo.aspx?coll=348ca38a-3a6d-4052-937d-cb017338d7b1