Is there a way to programmatically getting the analyzer used for indexing a given field by the Elasticsearch server instance via a client (assuming that the analyzer is available on both sides, of course)?
For example, using a mapping such as:
{
"mappings": {
"article": {
"properties": {
"text": {
"type": "string",
"index": "analyzed",
"analyzer": "spanish"
}
}
}
}
}
how would it be possible to get org.apache.lucene.analysis.es.SpanishAnalyzer
for the field text
using the Java client for Elasticsearch, as shown below?
import java.net.InetAddress;
import java.net.UnknownHostException;
import java.util.Collections;
import org.elasticsearch.action.search.SearchRequestBuilder;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.Client;
import org.elasticsearch.client.transport.TransportClient;
import org.elasticsearch.common.transport.InetSocketTransportAddress;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.QueryBuilders;
import org.elasticsearch.search.SearchHit;
public class QueryAnalyzerTest {
public static void main(final String[] args) throws UnknownHostException {
final String docTextFieldName = "text";
Iterable<SearchHit> hits = Collections.emptyList();
try (final Client client = TransportClient.builder().build()
.addTransportAddress(new InetSocketTransportAddress(InetAddress.getByName("localhost"), 9300))) {
final QueryBuilder queryBuilder = QueryBuilders.matchQuery(docTextFieldName, "anuncio");
final SearchRequestBuilder searchRequestBuilder = client.prepareSearch("news").setQuery(queryBuilder)
.setTypes("article");
final SearchResponse response = searchRequestBuilder.get();
hits = response.getHits();
}
hits.forEach(hit -> {
final String docText = (String) hit.getSource().get(docTextFieldName);
// TODO: Tokenize "docText" with the exact same tokenizer used when
// indexing the field
});
}
}
You can definitely get the mapping of the text
field programmatically using client().admin().indices().prepareGetFieldMappings("indexName")
, and you'll be able to retrieve the logical name of the analyzer (i.e. "spanish"), however, you won't get the class name of the analyzer.
For that you need to call AnalysisRegistry.getAnalyzer("spanish")
, and you'll get the proper analyzer instance.