is it possible to filter data by dimension value during ingestion from Kafka to Druid?
e.g. Considering dimension: version
, which might have values: v1
, v2
, v3
I would like to have only v2
loaded.
I realize it can be done using Spark/Flink/Kafka Streams, but maybe there is an out-of-the-box solution
You can do this with transformSpec
during ingestion.
http://druid.io/docs/latest/ingestion/transform-spec.html
Per the documentation:
Transform specs allow Druid to filter and transform input data during ingestion.
Any query filters can be applied to this.
Example usage with NOT
filter:
"transformSpec": {
"filter": {
"type": "and",
"fields": [
{
"type": "not",
"field": {
"type": "selector",
"dimension": "my_dimension",
"value": "filter_me"
}
},
{
"type": "not",
"field": {
"type": "selector",
"dimension": "my_dimension",
"value": "filter_me_also"
}
}
]
},
"transforms": []
}