I am using Apache Druid to store multi-value dimensions for customers.
While loading data from a CSV, I noticed that the order of the elements in the multi-value dimension is getting changed. E.g. Mumbai|Delhi|Chennai gets ingested as ["Chennai","Mumbai","Delhi"].
It is important for us to preserve the order of elements in order to apply filters in the query using MV_OFFSET
function. One work around is to create explicit order element and concatenate it to the element (like ["3~Chennai","1~Mumbai","2~Delhi"])- but this hampers plain group by
aggregations.
Is there any way to preserve the order of the elements in a multi-value dimension during load time?
Thanks to the response from Navis Ryu on Druid slack channel, following dimension spec will keep the order of the elements unchanged:
"dimensions": [
"page",
"language",
{
"type": "string",
"name": "userId",
"multiValueHandling": "ARRAY"
}
]
More details around the functionality here.