I have folder structure like:
s3://foo/table1/2021-06-12/03-35-00/
s3://foo/table1/Current/data
s3://foo/table2/2021-06-12/03-35-00/
s3://foo/table2/Current/data
s3://foo/table3/2021-06-12/03-35-00/
s3://foo/table3/Current/data
... so on
I want to exclude all date and timestamps pattern and only crawl Current/data folders. How can this be achieved ?
Considering your current crawler include path is set to crawl s3://foo
location, then you can use Exclude patterns configuration in crawler settings
like this : s3://foo/tabel*/2021*/**
which will skip all files and folders for all tabels if starting with date 2021...
Similarly you can add other glob patterns in this section to skip other files and folders.
For better understanding you can refer include and exclude patterns section here