Search code examples
ruby-on-railsrubysolrhydra

How to get Solr to tokenize forward slash


Custom identifiers on my Rails objects include forward slashes. For example an identifier might look like ncsu.edu/123456789. When I try to query Solr for that identifier I'm getting back any result that has ncsu.edu in it. The metadata for the Rails object is below:

class IntellectualObjectMetadata < ActiveFedora::RdfxmlRDFDatastream
  map_predicates do |map|
    map.intellectual_object_identifier(in: RDF::DC, to: 'identifier') do |index|
      index.as :stored_searchable
    end
  end
end

And I'm querying like so:

IntellectualObject.where(desc_metadata__intellectual_object_identifier_tesim: params[:intellectual_object_identifier]).first

I was wondering if anyone had any tips on how to tokenize the Solr query so it returns only objects that match the whole identifier instead of partial matches. Thanks.


Solution

  • Going from this answer here you can escape it with a backslash when you're searching for it, so in your case:

    IntellectualObject.where(desc_metadata__intellectual_object_identifier_tesim: params[:intellectual_object_identifier].gsub("/","\/")).first
    

    note the gsub to sub your / for \/

    EDIT: as you can see in the documentation here:

    Solr 4.0 added regular expression support, which means that '/' is now a special character and must be escaped if searching for literal forward slash.

    so if you have a token saved like aaa/bbb you search for it with aaa\/bbb

    EDIT #2: from the lucene docs which are linked to above.

    Lucene supports escaping special characters that are part of the query syntax. The current list special characters are

    + - && || ! ( ) { } [ ] ^ " ~ * ? : \ /