Search code examples
solrsolrj

How to periodically remove data from Apache Solr?


I am using Apache Solr 4.3.1 as a repository for storing and indexing the data . Now, One of the field is related to the data of posting of data. I want to update the repository by deleting those entries which are older than 30 days and keeping only relevant data on temporal basis.

I have a web application based on Solrj which is interacting with Solr server for fetching the search results. Should I add a scheduled thread for deleting data from Solr server after 30 days or Solr provides a functionality to automatically remove the data after specific time period?


Solution

  • Solr does not do that automatically.

    You can add a field timestamp which defaults to NOW when you insert the Records.

    <field name="timestamp" type="date" indexed="true" stored="true" default="NOW" /> 
    

    And you can have a job which runs periodically to clean up the data.
    The 30 days can be easily add to the delete query fq=timestamp:[* to NOW/DAY-30DAYS] (Check for the exact syntax)