Search code examples
luceneindexingsolrsearch-engine

indexing in solr lucene


I have a site in which users can post some questions, so I a have a table in mysql like this

question_id, user_id, tags, views, creation_date

what I want is to be able to

  • perform searches which will return question_ids based on those tags

    and order them by

    1. Views
    2. date, (like newest, or this week, month)
  • or searches for a specified user and return question_ids again ordered by views and date.

In what way should I bring everything in solr, as far as indexing is concerned? Will I have to index tags, views, date? What should I index so that I have maximal performance?


Solution

  • Think about, if using lucene/solr is relay a benefit for you. I don't wanna be misunderstood, but if you like to search inside an column user_id for an specific user ID, you don't need a addition fulltext-search engine.

    Anyway - maybe you only like to have an little project to "play with" solr. So here are the answers of your questions:

    In what way should I bring everything in solr, as far as indexing is concerned?

    Put everything to solr/lucene, you need to search for. Use the DHI (data import handler) http://wiki.apache.org/solr/DataImportHandler to let solr walk trough your table and index the data.

    Will I have to index tags, views, date?

    Yes. You have to index all the things you like to work with. btw: there is a difference between indexing and storing data. You can index fields (like tags, user_id, views,..) but you don't need to store them (additional) inside your lucene index. Storing data is necessary, if lucene/solr have to return/deliver the searched data. Otherwise, solr only returns the uniqueKey (primary key) of the matches documents and you have to fetch the data from the datebes (...where pk=< lucene result >) So you don't need to store those fields, which are only relevant for sorting (for example).

    What should I index so that I have maximal performance?

    Index only those fields (columns), you need to work with (solr). Don't index field you will never ask for / search for.