Search code examples
elasticsearchanalyzerluke

Analyze terms which get indexed in ElasticSearch


So I have a custom built analyzer which adds additional terms from an ontology. Additionally, I want to do a stemming of the terms before they get indexed. Following is the index metadata fetched from the elasticsearch head plugin.

{
    "state": "open",
    "settings": {
        "index": {
            "refresh_interval": "1000s",
            "number_of_shards": "5",
            "creation_date": "1471931611750",
            "analysis": {
                "filter": {
                    "owlfilter": {
                        "type": "owl",
                        "indexName": "ontoowl",
                        "expansionType": "RDFSLABEL",
                        "owlFile": "/home/tannys/elasticsearch-2.3.0/ontologyWorkTrial/myownowl.owl"
                    }
                },
                "analyzer": {
                    "owlanalyzer": {
                        "filter": ["owlfilter","porter_stem"],
                        "type": "custom",
                        "tokenizer": "standard"
                    }
                }
            },
            "number_of_replicas": "1",
            "uuid": "d8Ub8A0eSm65geMK_bpdvw",
            "version": {"created": "2030099"}
        }
    },
    "mappings": {
        "mytype": {
            "properties": {
                "nameortitle": {
                    "search_analyzer": "standard",
                    "analyzer": "owlanalyzer",
                    "store": true,
                    "type": "string"
                },
                "description": {
                    "search_analyzer": "standard",
                    "analyzer": "owlanalyzer",
                    "store": true,
                    "type": "string"
                }
            },
            "aliases": [ ]
        }
    }
}

The irony of the problem is, until I had used the porter_stem filter, the results were better. So I am not quite sure, what went wrong. I want to see the terms that are getting indexed. How can I see how the analyzer is performing, like say what luke does for Lucene? Any guidance.


Solution

  • You can use Term Vectors API here. That would give you the terms for a field in a document or you can also use multi-term API in same manner to see terms from multiple documents.