Search code examples
rankingtext-mining

ranking documents from my database


Everytime i seach for papers / documents on document ranking or text classification i am redirected to pages about related to web pages, but i want to rank documents in a repository.

Can someone suggest a book/paper that talks about ranking documents which are present in a database of documents (Every search result returns page rank or someother algorithm pertaining to the internet)

My aim is to rank the documents from my database based on their relevance to a query or based on a user's reference document(No internet or web sites involved)


Solution

  • You should probably stick to an existent document ranking library or database. Most SQL databases have a full-text search mechanism. If you are working with text indexing only, you might as well look into many text searching/document ranking solutions, such as Lucene (there are many others around as well).
    If you want to understand how ranking algorithms work, it could be worth taking a look at http://en.wikipedia.org/wiki/Tf-idf and http://en.wikipedia.org/wiki/Cosine_similarity.
    If you want to understand how indexing such information to make searching efficient, you should look at http://en.wikipedia.org/wiki/Inverted_index.
    Please note, however, that I am no expert on the matter, and many other approaches exist, although they shouldn't be too different in its basic form.
    Using a system that does this dirty job for you will not only save you time, but will also give you more robust and reliable querying capabilities then you would be able to implement on your own in a decent amount of time.