Search code examples
elasticsearchlucenefull-text-search

General approach to search into large CSV files in Ruby, at a reasonable time & affordable cost


I found some answers on StackOverflow, but nothing fits exactly my needs.

I am writing a Ruby script to find rows by a specific key into large CSV files (~500MB and 1M records each file).

The grep command is taking from 15-30 minutes find a match in 1 file.

I have 400+ files, and I have to run dozens of searches daily.

I need a simple, flexible and affordable solution to search in files.

  • I don't want to upload the CSVs to a robust database engine.
  • I don't want to pay for services like Elastic-Search.
  • I need to adapt to different columns-configuration and different keys periodically, with minimum effort.
  • I need read-only access to the files. Modifications and deletions are not required. So, indexes are built once and won't require further modifications.

Solution

  • I finally spent 1 day of work and developed this solution: CSV-Indexer.

    CSV-Indexer is not as robust as Lucene, but it is simple and cost-effective. May index files with millions of rows and find specific rows in matter of seconds.

    Find full documentation and examples here:

    https://github.com/leandrosardi/csv-indexer