Search code examples
amazon-s3druid

How do I exclude files of a specific extension on druid ingestion spec?


I have these files on S3 URI: s3://temp/sample/

file_1.parquet
file_2.parquet
file_3.parquet
file_4.random

How can I exclude file_4.random in the ingestion spec JSON file?

What I know so far: The spec file has options to include each file individually (uris) or the entire folder (prefixes), is there a way to include all files having the same extension?

"ioConfig": {
    "type": "index_parallel",
    "inputSource": {
      "type": "s3",
      "uris": null,
      "prefixes": [
        "s3://temp/sample/"
      ],
      "objects": null
    },

Solution

  • Simply use the filter property. In this example, I wanted to exclude those with .log extension.

    "ioConfig": {
      "type": "index_parallel",
      "inputSource": {
        "type": "s3",
        "uris": null,
        "prefixes": [
          "s3://temp/sample/"
        ],
        "objects": null,
        "filter": {
          "type": "not",
          "field": "filePath",
          "pattern": "\\.log$"
        }
      },
    },