Search code examples
javanlpgoogle-searchstanford-nlpopennlp

how does google gives summary of the page


Here is a snapshot of a query what is benzene.

Google generally provide summaries of documents or websites located in response to a query. A user browses such summaries, and typically selects a link associated with a summary that best matches the search.

I want to know that how does Google gives most accurate summery of any webpage. I have tried this by selecting the keywords (snippets) of any query and compute the distances between snippets and every single sentences in the webpage using Cosine Similarity and select the sentence having highest score but the result is not satisfying. I want to know is there any better algorithms or any alternative way of generating summaries of web documents?


Solution

  • What you can use to give a summary is using w meta description tag. Google is using it as well unless bot decides that he can generate more accurate description, where more accurate means better fitting to your search query. For example one of summaries from image you've posted is straight from description

    <meta name="description" content="Benzene is a colorless, flammable liquid with a sweet odor. Learn what we know about benzene and cancer risk." />
    

    Unless you're writing a search for some other kind of documents than webpages. What many documents browsers do, they just give you a sentence (or couple words before/after) a matched keyword.