search search-engine information-retrieval precision-recall

Measurements to evaluate a web search engine

I'm currently developing a small web search engine but I'm not sure how am I going evaluate it. I understand that a search engine can be evaluated by its precision and recall. In a more "localized" information retrieval system, e.g., an e-library, I can calculate both of them because I can know which stuffs are relevant to my query. But in a web-based information retrieval system, e.g., Google, it would be impossible to calculate the recall because I do not know how many web pages are relevant. This should means that F-measure and other measurements that require the number of relevant pages cannot be done.

Is everything I wrote correct? Is web search engine evaluation limited to precision only? Are there any other measurements I could use to evaluate a web search engine (other than P@k)?

Solution

You're correct that precision and recall, along with the F score / F measure are commonly-used metrics for evaluating (unranked) retrieval sets in search engine performance.

And you're also correct about the difficult or impossible nature of determining recall and precision scores for huge corpus of data such as all the web pages on the entire internet. For all search engines, small or large, I would argue that it's important to consider the role of human interaction in information retrieval: are the users using the search engine more interested in having a (ranked) list of relevant results that answers their information need or would one "top" relevant result be enough to satisfy the user's information needs? Check out the concept of "satisficing" as it pertains to information seeking for more information about how users assess when their information needs are met.

Whether you use precision, recall, mean-average precision, mean reciprocal rank, or any other of the numerous relevance and retrieval metrics it really depends on what you're trying to assess with regard to the quality of your search engine's results. I'd first try to figure out what sort of 'information needs' the users of my small search engine might have: will they be looking for a selection of relevant documents or would it be more helpful for their query needs if they had one 'best' document to satisfy their information needs? If you can better understand how your users will be using your small search engine you can then use that information to help inform which relevance model(s) will give your users results that they deem to be most useful for their information-seeking needs.

You might be interested in the free, online version of the Manning and Schütze "Introduction to Information Retrieval" text available from Stanford's NLP department which covers relevance and retrieval models, scoring and much more.
Google's Search Quality Evaluator training guide, which lists a few hundred dimensions on how Google's search results are ranked/scored, might be of interest to you too as you try to suss out your user's information-seeking goals. It's pretty neat to see all of the various factors that go into determining a web page's PageRank (Google's page ranking algorithm) score!