Search code examples
phpalgorithmsearchsearch-engineinformation-retrieval

Loose searching approach


I want to make a searching option for my site, and for fun I decided I should at least try to make it myself (If I fail, there's always Google Custom Search).

The problem is, I don't even know how to approach this monster! Here are the requirements:

  • Not all keywords will be required in the search (Should one search for "Big happy world", it would also search for "Big world" "happy world" etc)
  • Common spelling mistakes considerations (from a database, via edit difference or a predefined list of common mistakes (rather then => rather than, etc).
  • Search in both content and titles of posts, with an emphesis on titles.
  • Don't suck

I've searched my old pal Google for it, but the only reasonable things I found were academic level papers on the subject (English isn't my native, I'm good but not that good =( ).

So in short: does anyone know of a good place to start, a tutorial, an article, an example?

Thanks in advance.


Solution

  • If you want to create your own search engine, apache lucene is a mature open source library that can take care of a big part of the functionality for you.

    Using lucene, you first index your information [using an IndexWriter]. This is done off line, to create the index.
    On serach - you use an IndexSearcher to find documents that match your query.

    If you want some theoretical knowledge on "how it works", you should read more on information retrieval. A good place to start is stanford's introduction to information retrieval