Search code examples
regexdatabaseelasticsearchelasticsearch-dsl

Elasticsearch Query on indexes whose name is matching a certain pattern


I have a couple of indexes in my Elasticsearch DB as follows

Index_2019_01

Index_2019_02

Index_2019_03

Index_2019_04

.
.

Index_2019_12

Suppose I want to search only on the first 3 Indexes. I mean a regular expression like this:

select count(*) from Index_2019_0[1-3] where LanguageId="English"

What is the correct way to do that in Elasticsearch?


Solution

  • How can I query several indexes with certain names?

    This can be achieved via multi-index search, which is a built-in capability of Elasticsearch. To achieve described behavior one should try a query like this:

    POST /index_2019_01,index_2019_02/_search
    {
      "query": {
        "match": {
          "LanguageID": "English"
        }
      }
    }
    

    Or, using URI search:

    curl 'http://<host>:<port>/index_2019_01,index_2019_02/_search?q=LanguageID:English'
    

    More details are available here. Note that Elasticsearch requires index names to be lowercase.

    Can I use a regex to specify index name pattern?

    In short, no. It is possible to use index name in queries using a special "virtual" field _index but its use is limited. For instance, one cannot use a regexp against index name:

    The _index is exposed as a virtual field — it is not added to the Lucene index as a real field. This means that you can use the _index field in a term or terms query (or any query that is rewritten to a term query, such as the match, query_string or simple_query_string query), but it does not support prefix, wildcard, regexp, or fuzzy queries.

    For instance, the query from above can be rewritten as:

    POST /_search
    {
      "query": {
        "bool": {
          "must": [
            {
              "terms": {
                "_index": [
                  "index_2019_01",
                  "index_2019_02"
                ]
              }
            },
            {
              "match": {
                "LanguageID": "English"
              }
            }
          ]
        }
      }
    }
    

    Which employs a bool and a terms queries.

    Hope that helps!