Search code examples
javaelasticsearchluceneclientanalyzer

Elasticsearch: Getting analyzer used for indexing a given field from the client side


Is there a way to programmatically getting the analyzer used for indexing a given field by the Elasticsearch server instance via a client (assuming that the analyzer is available on both sides, of course)?

For example, using a mapping such as:

{
    "mappings": {
        "article": {
            "properties": {
                "text": {
                    "type": "string",
                    "index": "analyzed",
                    "analyzer": "spanish"
                }
            }
        }
    }
}

how would it be possible to get org.apache.lucene.analysis.es.SpanishAnalyzer for the field text using the Java client for Elasticsearch, as shown below?

import java.net.InetAddress;
import java.net.UnknownHostException;
import java.util.Collections;

import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.Client;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;

public class QueryAnalyzerTest {

    public static void main(final String[] args) throws UnknownHostException {
        final String docTextFieldName = "text";
        Iterable<SearchHit> hits = Collections.emptyList();

        try (final Client client = TransportClient.builder().build()
                .addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("localhost"), 9300))) {
            final QueryBuilder queryBuilder = QueryBuilders.matchQuery(docTextFieldName, "anuncio");
            final SearchRequestBuilder searchRequestBuilder = client.prepareSearch("news").setQuery(queryBuilder)
                    .setTypes("article");
            final SearchResponse response = searchRequestBuilder.get();
            hits = response.getHits();
        }

        hits.forEach(hit -> {
            final String docText = (String) hit.getSource().get(docTextFieldName);
            // TODO: Tokenize "docText" with the exact same tokenizer used when
            // indexing the field
        });

    }

}

Solution

  • You can definitely get the mapping of the text field programmatically using client().admin().indices().prepareGetFieldMappings("indexName"), and you'll be able to retrieve the logical name of the analyzer (i.e. "spanish"), however, you won't get the class name of the analyzer.

    For that you need to call AnalysisRegistry.getAnalyzer("spanish"), and you'll get the proper analyzer instance.