Search code examples
searchelasticsearchfull-text-searchsearch-engine

How to search through data with arbitrary amount of fields?


I have the web-form builder for science events. The event moderator creates registration form with arbitrary amount of boolean, integer, enum and text fields.

Created form is used for:

  • register a new member to event;
  • search through registered members.

What is the best search tool for second task (to search memebers of event)? Is ElasticSearch well for this task?


Solution

  • I wrote a post about how to index arbitrary data into Elasticsearch and then to search it by specific fields and values. All this, without blowing up your index mapping.

    The post is here: http://smnh.me/indexing-and-searching-arbitrary-json-data-using-elasticsearch/

    In short, you will need to do the following steps to get what you want:

    1. Create a special index described in the post.
    2. Flatten the data you want to index using the flattenData function:
      https://gist.github.com/smnh/30f96028511e1440b7b02ea559858af4.
    3. Create a document with the original and flattened data and index it into Elasticsearch:

      {
          "data": { ... },
          "flatData": [ ... ]
      }
      
    4. Optional: use Elasticsearch aggregations to find which fields and types have been indexed.

    5. Execute queries on the flatData object to find what you need.

    Example

    Basing on your original question, let's assume that the first event moderator created a form with following fields to register members for the science event:

    • name string
    • age long
    • sex long - 0 for male, 1 for female

    In addition to this data, the related event probably has some sort of id, let's call it eventId. So the final document could look like this:

    {
        "eventId": "2T73ZT1R463DJNWE36IA8FEN",
        "name": "Bob",
        "age": 22,
        "sex": 0
    }
    

    Now, before we index this document, we will flatten it using the flattenData function:

    flattenData(document);
    

    This will produce the following array:

    [
        {
            "key": "eventId",
            "type": "string",
            "key_type": "eventId.string",
            "value_string": "2T73ZT1R463DJNWE36IA8FEN"
        },
        {
            "key": "name",
            "type": "string",
            "key_type": "name.string",
            "value_string": "Bob"
        },
        {
            "key": "age",
            "type": "long",
            "key_type": "age.long",
            "value_long": 22
        },
        {
            "key": "sex",
            "type": "long",
            "key_type": "sex.long",
            "value_long": 0
        }
    ]
    

    Then we will wrap this data in a document as I've showed before and index it.

    Then, the second event moderator, creates another form having a new field, field with same name and type, and also a field with same name but with different type:

    • name string
    • city string
    • sex string - "male" or "female"

    This event moderator decided that instead of having 0 and 1 for male and female, his form will allow choosing between two strings - "male" and "female".

    Let's try to flatten the data submitted by this form:

    flattenData({
        "eventId": "F1BU9GGK5IX3ZWOLGCE3I5ML",
        "name": "Alice",
        "city": "New York",
        "sex": "female"
    });
    

    This will produce the following data:

    [
        {
            "key": "eventId",
            "type": "string",
            "key_type": "eventId.string",
            "value_string": "F1BU9GGK5IX3ZWOLGCE3I5ML"
        },
        {
            "key": "name",
            "type": "string",
            "key_type": "name.string",
            "value_string": "Alice"
        },
        {
            "key": "city",
            "type": "string",
            "key_type": "city.string",
            "value_string": "New York"
        },
        {
            "key": "sex",
            "type": "string",
            "key_type": "sex.string",
            "value_string": "female"
        }
    ]
    

    Then, after wrapping the flattened data in a document and indexing it into Elasticsearch we can execute complicated queries.

    For example, to find members named "Bob" registered for the event with ID 2T73ZT1R463DJNWE36IA8FEN we can execute the following query:

    {
        "query": {
            "bool": {
                "must": [
                    {
                        "nested": {
                            "path": "flatData",
                            "query": {
                                "bool": {
                                    "must": [
                                        {"term": {"flatData.key": "eventId"}},
                                        {"match": {"flatData.value_string.keyword": "2T73ZT1R463DJNWE36IA8FEN"}}
                                    ]
                                }
                            }
                        }
                    },
                    {
                        "nested": {
                            "path": "flatData",
                            "query": {
                                "bool": {
                                    "must": [
                                        {"term": {"flatData.key": "name"}},
                                        {"match": {"flatData.value_string": "bob"}}
                                    ]
                                }
                            }
                        }
                    }
                ]
            }
        }
    }