Search code examples
rankinginformation-retrievalcosine-similarityranking-functions

Ranking evaluation approach in two stage document retrieval


I have created a two-stage ranking system based on textual similarity ( cosine similarity ) between query-documents pair. Now I need to validate my ranking system whether the retrieved duly-ranked items are correct or not with respect to the user, which approach should I opt for. I read about Pointwise/Pairwise/Listwise approach to validate ranking, but for manual evaluation of a ranking system, which would be more helpful. If somebody can enlighten a better strategy for ranking evaluation approach, it would be very helpful for me. Thanks


Solution

  • If I get the question correctly, you are looking for an evaluation methodology to figure out whether your two-stage retrieval system works well or not. If this is true, you can use one of the following evaluation methodologies:

    • Relevance judgements: You can use TREC-like collections with a few hundred queries and explicit relevance judgement and use IR evaluation metrics (like MAP, P@10, NDCG, etc.) to evaluate your model.
    • A/B testing: In fact, you can show the initial result and the re-ranked results by the second stage of your retrieval system and ask users to judge whether the re-ranked one is better or not.
    • Click data: If you have access to search engine logs, you can use the click information of users to evaluate your model. To do so, you should be aware of several bias problems, e.g., positional bias problem.

    Among the aforementioned strategies, the first one should be easier and cheaper to do. You just need to have access to TREC data, which is not private (but you need to pay a few hundred dollars to get access to most of them).