Search code examples
solrlucidworksbanana

Banana Dashboard For Solr Not Tokenizing Location Names Correctly


I am using banana dashboard for generating a non time series dashboard for my solr indexed data. The "location" field in the indexed data doesn't display correctly in the banana dashboard facets widget with names like "San Francisco", "New York" being shown as "San" and "Francisco" and "New" and "York".

However when I cross check my Solr Query results these fields are correctly shown as a single entity "San Francisco" and "New York".

In the Solr core the managed-schema.xml file has the below entries:

<field name="content" type="opennlp-en-tokenization" indexed="true" stored="true" multiValued="true"/>
<field name="person" type="text_general" indexed="true" stored="true" multiValued="true"/>
<field name="organization" type="text_general" indexed="true" stored="true" multiValued="true"/>
<field name="location" type="text_general" indexed="true" stored="true" multiValued="true"/>

Any idea where I might be going wrong?

Banana Dashboard With Loc Names having Space Wrongly Tokenized As Two Different Places

Banana Dashboard With Loc Names having Space Wrongly Tokenized As Two Different Places

Solr Dashboard With Loc Names Having Space Correctly Shown As One Single Location Solr Dashboard With Loc Names Having Space Correctly Shown As One Single Location


Solution

  • Your location field has text_general as its tokenizer. That will split the input into multiple tokens, ending up with the result you're showing.

    Change it to a string field or use a KeywordTokenizer (if you need to process it in any way). If you want to still be able to use the field for searching without having to have an exact match, define another field as the string field and facet on that, and use copyField to copy the content into both fields.

    The reason is that faceting uses the tokens for generating the counts, and not the stored text for the field (which is what you see when you query the document). The tokens are not directly visible (.. except when faceting or retrieving terms), but you can see how your content is processed and what tokens your input ends up as under the "Analysis" page under the Solr Admin.