Search code examples
sqldatabasefull-text-searchsql-like

MS Sql Full-text search vs. LIKE expression


I'm currently looking for a way to search a big database (500MB - 10GB or more on 10 tables) with a lot of different fields(nvarchars and bigints). Many of the fields, that should be searched are not in the same table.

An example: A search for '5124 Peter' should return all items, that ...

  • have an ID with 5124 in it,
  • have 'Peter' in the title or description
  • have item type id with 5124 in it
  • created by a user named 'peter' or a user whose id has 5124 in it
  • created by a user with '5124' or 'peter' in his street address.

How should i do the search? I read that the full-text search of MS-Sql is a lot more performant than a query with the LIKE keyword and i think the syntax is more clear, but i think it cant search on bigint(id) values and i read it has performance problems with indexing and therefore slows down inserts to the DB. In my project there will be more inserting than reading, so this could be a matter.

Thanks in advance, Marks


Solution

  • I don't think you're going to get the performance you need out of MS SQL; you're going to need to construct very complex queries to cover all the data/tables that you're going to be searching, and you have the added encumbrance of writing data to the database at the same time as you are querying it.

    I would suggest you look at either Apache Solr (http://lucene.apache.org/solr/) or Lucene (http://lucene.apache.org). Solr is built on top of Lucene, both can be used to create an inverted file index, basically like the index in the back of book (term 1 appears in documents 1, 3, 7, etc.) Solr is a search-engine-in-a-box, and has several mechanisms that will let you tell it how and where to index data. Lucene is more lower-level, and will let you set up your indexing and searching architecture with more flexibility.

    The good thing about Solr is that it's available as a web service, so if you're not familiar with Java, you can find a Solr client in the language of your choice, and write indexing and searching code in whatever language suits you. Here's a link to a list of client libraries for Solr, including some in C# http://wiki.apache.org/solr/IntegratingSolr That's where I'd start.