I am developing rails app with sunspot Solr search engine and I'm in need of indexing phone numbers in Solr 4.1.
For example, if I have phone number "+12 (456) 789-0101", my page should be founded by queries:
.......(456) 789......... (middle part of phone in correct format)
124567890101 (full phone with numbers only)
I know that I can use:
EdgeNGramFilterFactory
for splitting phone to NGrams (front and back)WordDelimiterFilterFactory
for catenate numbers and splitting phone for parts. So, what I have done:
Create new Solr field type in shema.xml
:
<fieldType name="phone_number" class="solr.TextField">
<analyzer type="index">
<tokenizer class="solr.StandardTokenizerFactory"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="20" side="front"/>
<filter class="solr.EdgeNGramFilterFactory" minGramSize="3" maxGramSize="20" side="back"/>
</analyzer>
</fieldType>
<dynamicField name="*_phone" stored="false" type="phone_number" multiValued="true" indexed="true"/>
Define searchable phone fields as '*_phone' type:
string :work_phone, :as => :work_phone, :stored => true do
work_phone.gsub(/\D/, '') if work_phone
end
string :mobile_phone, :as => :mobile_phone, :stored => true do
mobile_phone.gsub(/\D/, '') if mobile_phone
end
Run reindexing:
bundle exec rake sunspot:rebuild
But it does not work when reindexing finished, I can found results only searching wiht queries: "full phone" and "left part of phone". Search with "middle part of phone" and "right part of phone" doesn't give me any results.
Did I make somethig wrong? How to make phone part searing correctly? Please, help. thanks!
Ectualy, it is my code, which works:
Schema.xml:
<fieldType class="solr.TextField" name="phone_number" positionIncrementGap="100">
<analyzer type="index">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.NGramFilterFactory" minGramSize="3" maxGramSize="20"/>
</analyzer>
<analyzer type="query">
<tokenizer class="solr.WhitespaceTokenizerFactory"/>
<filter class="solr.LowerCaseFilterFactory"/>
<filter class="solr.WordDelimiterFilterFactory" catenateNumbers="1"/>
</analyzer>
</fieldType>
<dynamicField name="*_phone" stored="false" type="phone_number" multiValued="false" indexed="true"/>
<dynamicField name="*_phones" stored="false" type="phone_number" multiValued="false" indexed="true"/>
And ruby code:
text :work_phone
text :work_phone_parts, :as => :work_phone do
"00#{work_phone.gsub(/\D/, '')}" if work_phone
end
text :mobile_phone
text :mobile_phone_parts, :as => :mobile_phone do
"00#{mobile_phone.gsub(/\D/, '')}" if mobile_phone
end