Search code examples
javascriptsearchlucenefull-text-search

Full-text search for static HTML files on CD-Rom via javascript


I will be delivering a set of static HTML pages on CD-Rom; these pages need to be fully viewable with no Internet access whatsoever.

I'd like to provide a full-text search (Lucene-like) for the content of those pages, which should "just work" from the CD-Rom with no software installation on the client machine.

A search engine implementation in javascript would be the perfect solution, but I have trouble finding any that looks solid / current / popular...?

I did find these: + jsFind + js-search

but both projects seem rather inactive?

Another solution, besides a specific search engine in javascript, would be the ability to access local Lucene indices from javascript: the indices themselves would be built with Lucene and copied to the CD-Rom along with the HTML files.

Edit: built it myself (see below).


Solution

  • Well in fact I built it myself.

    The existing solutions (that I could find) were unconvincing.

    I wanted to be able to search a very long tree (ul/li/ul...) that is displayed as one page; it contains 5000+ items.

    It sounds a little weird to display such a long tree on one page but in fact with collapse / expand it's much more intuitive than separate pages, and since we're offline, download times are not a problem (parsing times are, though, but Chrome is amazing ;-)

    The "search" function provided with modern browsers (FF and Chrome anyway) have two big problems: they only search visible items on the page, and they can't search non-consecutive words.

    I want to be able to search collapsed items (not visible on the screen); I want to find "one two three" when searching "one three" (just like with Google / Lucene); and I want to open just the branches of the tree containing found items.

    So, what I did was:

    1. create an inverted index of words <-> ids of items from the list (via xslt) (approx. 4500 unique words in the document)
    2. convert this index to bunch of javascript arrays (one word = one array, containing ids)
    3. when searching, intersect the arrays represented by the search words
    4. step 3 returns an array of ids that I can then open / highlight

    It does exactly what I needed and it's really fast. Better yet, since it searches from an independant "index" (arrays of ids) it can search when the list is not even loaded in the browser!