Search code examples
phpsearchsolrsilverstripecwp

Silverstripe solr search files, pages, and dataobjects


How to correctly add files to the search index...

Using a custom index I can sucessfully search pages and dataobjects, however as soon as I attempt to include files in this index, pages drops off from the result set and I only get file and dataobjects returned.

This will return pages and dataobjects as expected.

class EntrySearchIndex extends SolrSearchIndex
{
    public function init()
    {
        $this->addClass('SiteTree');
        $this->addClass('EntryAccordionItem');
        $this->addClass('EntryInformationBoxItem');
        $this->addClass('EntryTabItem');

        $this->addAllFulltextFields();
        $this->addFilterField('ShowInSearch');

        $this->excludeVariantState(array('SearchVariantVersioned' => 'Stage'));
    }
}

and a basic working search function

public static function keywordSearch($keywords)
{
    $keywords = Convert::raw2sql(trim($keywords));

    $classes[] = array('class' => 'EntryPage', 'includeSubclasses' => true);
    $classes[] = array('class' => 'EntryAccordionItem');
    $classes[] = array('class' => 'EntryInformationBoxItem');
    $classes[] = array('class' => 'EntryTabItem');

    $index = singleton('EntrySearchIndex');
    $engine = SearchQuery::create();

    return $engine->search($keywords, $classes, $index, -1, 0)->getResults();
}

Making the following minor modifications to allow for files (only alterations shown for brevity)

public function init()
{
    $this->addClass('SiteTree');
    $this->addClass('EntryAccordionItem');
    $this->addClass('EntryInformationBoxItem');
    $this->addClass('EntryTabItem');

    // File specific
    $this->addClass('File');
    $this->addFulltextField('FileContent');

    $this->addAllFulltextFields();
    $this->addFilterField('ShowInSearch');
    $this->excludeVariantState(array('SearchVariantVersioned' => 'Stage'));
}


public static function keywordSearch($keywords)
{
    [...]

    // File specific
    $classes[] = array('class' => 'File', 'includeSubclasses' => true);

    [...]

    return $engine->search($keywords, $classes, $index, -1, 0)->getResults();
}

Returns only files and dataobjects. Am I right in thinking $this->addAllFulltextFields(); is now only being applied to Files?


Solution

  • I had a similar (but slightly different) problem around including both pages and files in the Solr index, but the way I figured out what was going on might be of help.

    The issue was that we wanted files to have an Abstract text field which the user could enter a short description of the file in, but the common web platform (CWP) pages had an abstract field on them so Solr indexed that rather than the abstract field on the files.

    For the issue you are facing, have you tried logging in to the Solr server and browsing the schema to see what fields Solr is actually including in the index?

    If running Solr locally (using the silverstripe/fulltextsearch-localsolr module) you should be able to access the server here http://localhost:8983/solr

    Once in the Solr server web interface, try doing the following...

    • Choose your index from the dropdown in the left menu
    • Click Schema Browser at the bottom
    • On the right pane, click the 'Please select..' dropdown at the top and check to see if the fields in the index are as expected.

    If lucky you might see that Solr has chosen to index something incorrectly (perhaps compare index fields with and without files in the index) and that will give you clue as to how to resolve this.

    On that note I think its probably better NOT to use $this->addAllFulltextFields(); as that chucks everything in to the index. I would specify which fields are required. In the case of pages usually Title, Abstract, Content are all that's really required.

    Also another Tip for you; I found that if IncludeSubclasses is set to true for files the search results will include folders in the assets directory and also images. In our case we only wanted documents so setting IncludeSublcasses to false for files excluded images and the folders.

    If you by chance did or do solve this it would be great if you can post what the cause and resolution is.

    Cheers, DouG.