I have these files on S3 URI: s3://temp/sample/
file_1.parquet
file_2.parquet
file_3.parquet
file_4.random
How can I exclude file_4.random
in the ingestion spec JSON file?
What I know so far: The spec file has options to include each file individually (uris) or the entire folder (prefixes), is there a way to include all files having the same extension?
"ioConfig": {
"type": "index_parallel",
"inputSource": {
"type": "s3",
"uris": null,
"prefixes": [
"s3://temp/sample/"
],
"objects": null
},
Simply use the filter property. In this example, I wanted to exclude those with .log extension.
"ioConfig": {
"type": "index_parallel",
"inputSource": {
"type": "s3",
"uris": null,
"prefixes": [
"s3://temp/sample/"
],
"objects": null,
"filter": {
"type": "not",
"field": "filePath",
"pattern": "\\.log$"
}
},
},