Search code examples
perlsearchflat-file

How do I do a full-text search search of flat files with Perl?


We have a Perl-based web application whose data originates from a vast repository of flat text files. Those flat files are placed into a directory on our system, we extensively parse them inserting bits of information into a MySQL database, and subsequently move those files to their archived repository and permanent home (/www/website/archive/*.txt). Now, we don't parse every single bit of data from these flat files and some of the more obscure data items don't get databased.

The requirement currently out there is for users to be able to perform a full-text search of the entire flat-file repository from a Perl-generated webpage and bring back a list of hits that they could then click on and open the text files for review.

What is the most elegant, efficient and non CPU intensive method to enable this searching capability?


Solution

  • I'd recommend, in this order:

    1. Suck the whole of every document into a MySQL table and use MySQL's full-text search and indexing features. I've never done it but MySQL has always been able to handle more than I can throw at it.

    2. Swish-E still exists and is designed for building full-text indexes and allowing ranked results. I've been running it for a few years and it works pretty well.

    3. You can use File::Find in your Perl code to chew through the repository like grep -r, but it will suck compared to one of the indexed options above. However, it will work, and might even surprise you :)