Search code examples
solrlucenesolrjluke

retrieve analysed shingles from solr doc (lucene, luke)


I have created a solr field as follows:

<analyzer type="index">
    <tokenizer class="solr.LowerCaseTokenizerFactory"/>              
    <filter class="solr.StopFilterFactory" words="stopwords.txt" ignoreCase="true"/>
    <filter class="solr.ShingleFilterFactory" minShingleSize="3" maxShingleSize="5"/>
    <filter class="solr.PatternReplaceFilterFactory" pattern=".*_.*" replacement=""/>
</analyzer>

It creates shingles of docs with expected results. I want to get all the shingles of specific filter query which i am not able to find. I tried using luke to get indexes but, its giving me all the shingles not from filter query. Is there a way possible to get such data?


Solution

  • Faceting by that field will give you all the tokens together with the counts how many times the tokens occur. This might be sufficient.

    If you are doing this for testing individual inputs, you can also just try it in the Web Admin UI's Analysis screen.