So I have a custom built analyzer which adds additional terms from an ontology. Additionally, I want to do a stemming of the terms before they get indexed. Following is the index metadata
fetched from the elasticsearch head plugin.
{
"state": "open",
"settings": {
"index": {
"refresh_interval": "1000s",
"number_of_shards": "5",
"creation_date": "1471931611750",
"analysis": {
"filter": {
"owlfilter": {
"type": "owl",
"indexName": "ontoowl",
"expansionType": "RDFSLABEL",
"owlFile": "/home/tannys/elasticsearch-2.3.0/ontologyWorkTrial/myownowl.owl"
}
},
"analyzer": {
"owlanalyzer": {
"filter": ["owlfilter","porter_stem"],
"type": "custom",
"tokenizer": "standard"
}
}
},
"number_of_replicas": "1",
"uuid": "d8Ub8A0eSm65geMK_bpdvw",
"version": {"created": "2030099"}
}
},
"mappings": {
"mytype": {
"properties": {
"nameortitle": {
"search_analyzer": "standard",
"analyzer": "owlanalyzer",
"store": true,
"type": "string"
},
"description": {
"search_analyzer": "standard",
"analyzer": "owlanalyzer",
"store": true,
"type": "string"
}
},
"aliases": [ ]
}
}
}
The irony of the problem is, until I had used the porter_stem
filter, the results were better. So I am not quite sure, what went wrong. I want to see the terms that are getting indexed. How can I see how the analyzer is performing, like say what luke does for Lucene?
Any guidance.
You can use Term Vectors API here. That would give you the terms for a field in a document or you can also use multi-term API in same manner to see terms from multiple documents.