Search code examples
solrsunspotsolr4sunspot-railssunspot-solr

Search part of phone number with Sunspot Solr


I am developing rails app with sunspot Solr search engine and I'm in need of indexing phone numbers in Solr 4.1.

For example, if I have phone number "+12 (456) 789-0101", my page should be founded by queries:

  • +12 (456) 789-0101 (phone in correct format)
  • +12 (456) 789......... (left part of phone in correct format)
  • .......(456) 789-0101 (right part of phone in correct format)
  • .......(456) 789......... (middle part of phone in correct format)

  • 124567890101 (full phone with numbers only)

  • 1245678.......... (left part of phone with catenated numbers)
  • ............890101 (right part of phone with catenated numbers)
  • ......567890...... (middle part of phone with catenated numbers)

I know that I can use:

  • EdgeNGramFilterFactory for splitting phone to NGrams (front and back)
  • WordDelimiterFilterFactory for catenate numbers and splitting phone for parts.

So, what I have done:

  1. Create new Solr field type in shema.xml:

    <fieldType name="phone_number" class="solr.TextField"> <analyzer type="index"> <tokenizer class="solr.StandardTokenizerFactory"/> <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="20" side="front"/> <filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="20" side="back"/> </analyzer> </fieldType>

    <dynamicField name="*_phone" stored="false" type="phone_number" multiValued="true" indexed="true"/>

  2. Define searchable phone fields as '*_phone' type:

    string :work_phone, :as => :work_phone, :stored => true do work_phone.gsub(/\D/, '') if work_phone end

    string :mobile_phone, :as => :mobile_phone, :stored => true do mobile_phone.gsub(/\D/, '') if mobile_phone end

  3. Run reindexing:

    bundle exec rake sunspot:rebuild

    But it does not work when reindexing finished, I can found results only searching wiht queries: "full phone" and "left part of phone". Search with "middle part of phone" and "right part of phone" doesn't give me any results.

Did I make somethig wrong? How to make phone part searing correctly? Please, help. thanks!


Solution

  • Ectualy, it is my code, which works:

    Schema.xml:

        <fieldType class="solr.TextField" name="phone_number" positionIncrementGap="100">       
        <analyzer type="index">         
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>         
          <filter class="solr.LowerCaseFilterFactory"/>         
          <filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="20"/>
        </analyzer>       
        <analyzer type="query">         
          <tokenizer class="solr.WhitespaceTokenizerFactory"/>         
          <filter class="solr.LowerCaseFilterFactory"/>         
          <filter class="solr.WordDelimiterFilterFactory" catenateNumbers="1"/>       
        </analyzer>     
        </fieldType>
    
     <dynamicField name="*_phone"  stored="false"  type="phone_number" multiValued="false" indexed="true"/>
     <dynamicField name="*_phones" stored="false"  type="phone_number" multiValued="false" indexed="true"/>
    

    And ruby code:

      text :work_phone
    
      text :work_phone_parts, :as => :work_phone do
        "00#{work_phone.gsub(/\D/, '')}" if work_phone
      end
    
      text :mobile_phone
    
      text :mobile_phone_parts, :as => :mobile_phone do
        "00#{mobile_phone.gsub(/\D/, '')}" if mobile_phone
      end