Search code examples
druid

Apache Druid - preserving order of elements in a multi-value dimension


I am using Apache Druid to store multi-value dimensions for customers.

While loading data from a CSV, I noticed that the order of the elements in the multi-value dimension is getting changed. E.g. Mumbai|Delhi|Chennai gets ingested as ["Chennai","Mumbai","Delhi"].

It is important for us to preserve the order of elements in order to apply filters in the query using MV_OFFSET function. One work around is to create explicit order element and concatenate it to the element (like ["3~Chennai","1~Mumbai","2~Delhi"])- but this hampers plain group by aggregations.

Is there any way to preserve the order of the elements in a multi-value dimension during load time?


Solution

  • Thanks to the response from Navis Ryu on Druid slack channel, following dimension spec will keep the order of the elements unchanged:

    "dimensions": [
        "page",
        "language",
        { 
            "type": "string",
            "name": "userId", 
            "multiValueHandling": "ARRAY" 
        }
    ]
    

    More details around the functionality here.