Search code examples
apache-kafkadruiddata-ingestion

Druid with Kafka Ingestion: filtering data


is it possible to filter data by dimension value during ingestion from Kafka to Druid?

e.g. Considering dimension: version, which might have values: v1, v2, v3 I would like to have only v2 loaded.

I realize it can be done using Spark/Flink/Kafka Streams, but maybe there is an out-of-the-box solution


Solution

  • You can do this with transformSpec during ingestion.
    http://druid.io/docs/latest/ingestion/transform-spec.html

    Per the documentation:

    Transform specs allow Druid to filter and transform input data during ingestion.

    Any query filters can be applied to this.

    Example usage with NOT filter:

    "transformSpec": {
      "filter": {
        "type": "and",
        "fields": [
          {
            "type": "not",
            "field": {
              "type": "selector",
              "dimension": "my_dimension",
              "value": "filter_me"
            }
          },
          {
            "type": "not",
            "field": {
              "type": "selector",
              "dimension": "my_dimension",
              "value": "filter_me_also"
            }
          }
        ]
      },
      "transforms": []
    }