Search code examples

ElasticSearch analyzer auto-complete feature for alphanumeric

I have alphanumeric codes like Hcc18, HCC23, I23, which I want to store in ElasticSearch. Over this I want to build following two features:-

  1. User can search complete alphanumeric code or just the integer part.
    Example: for hcc15 or 15, hcc15 should be in the output and on the top of the results.
  2. Autocomplete feature: When the user type let's say I42 the results should contain I420, I421 and so on.

My Elasticsearch current mapping is:

"mappings": {
  "properties": {
    "code": {
      "type": "text",
      "analyzer": "autoanalyer"
"settings": {
  "analysis": {
    "analyzer": {
      "autoanalyer": {
        "tokenizer": "standard",
        "filter": [
    "tokenizer": {
      "autotoken": {
        "type": "simple_pattern",
        "pattern": "[0-9]+"

Query being made:

    "min_score": 0.1,
    "from": 0,
    "size": 10000,
    "query": {
        "bool": {
            "should": [{ "match": {"code": search_term}}]

Two problems, I am facing with this approach is:-

  1. Let's say I search for I420, now because mapping is based only on digits, I am getting all the codes related to number 420, but the exact match I420 isn't coming on the top.

  2. Will this mapping how will I be able to achieve the above mentioned Autocomplete feature.


  • You had multiple requirements and all these can be achieved using

    1. Creating a custom analyzer that tokenizes data according to our requirements.
    2. Using a bool query with the combination of the prefix (for autocomplete) and match for number search.

    Below is the step by step example, using the OP data and queries.

    Index Def

        "settings": {
            "analysis": {
                "analyzer": {
                    "my_analyzer": {
                        "tokenizer": "autotoken" -->used your analyzer to extract numbers
                "tokenizer": {
                    "autotoken": {
                        "type": "simple_pattern",
                        "pattern": "[0-9]+",
                        "preserve_original": true
        "mappings": {
            "properties": {
                "code": {
                    "type": "keyword",
                    "fields": {
                        "number": {
                            "type": "text",
                            "analyzer" : "my_analyzer"

    Index few docs

      "code" : "hcc420"
      "code" : "HCC23"
      "code" : "I23"
      "code" : "I420"
      "code" : "I421"
      "code" : "hcc420"

    Search query (issue 1, searching for I420, should bring 2 docs in sample data I420 and hcc420 but I420 must have more score as exact match)

        "query": {
            "bool": {
                "should": [
                        "prefix": {
                            "code": {
                                "value": "I420"
                        "match": {
                            "code.number": "I420"


    "hits": [
            "_index": "so_number",
            "_type": "_doc",
            "_id": "4",
            "_score": 2.0296195, --> note exact match having high score
            "_source": {
              "code": "I420"
            "_index": "so_number",
            "_type": "_doc",
            "_id": "7",
            "_score": 1.0296195,
            "_source": {
              "code": "hcc420"

    Part 2: The same search query can be used autocomplete feature

    So searching for I42 must bring I420 and I421 from sample docs

        "query": {
            "bool": {
                "should": [
                        "prefix": {
                            "code": {
                                "value": "I42"
                        "match": {
                            "code.number": "I42"


     "hits": [
            "_index": "so_number",
            "_type": "_doc",
            "_id": "4",
            "_score": 1.0,
            "_source": {
              "code": "I420"
            "_index": "so_number",
            "_type": "_doc",
            "_id": "5",
            "_score": 1.0,
            "_source": {
              "code": "I421"

    Let's take another example for number search, searching for 420 must bring hcc420 and I420

    Search query

            "query": {
                "bool": {
                    "should": [
                            "prefix": {
                                "code": {
                                    "value": "420"
                            "match": {
                                "code.number": "420"
    And whoa, again it gave expected results 😀
     "hits": [
            "_index": "so_number",
            "_type": "_doc",
            "_id": "4",
            "_score": 1.0296195,
            "_source": {
              "code": "I420"
            "_index": "so_number",
            "_type": "_doc",
            "_id": "7",
            "_score": 1.0296195,
            "_source": {
              "code": "hcc420"