My paths are of the format s3://my_bucket/timestamp=yyyy-mm-dd HH:MM:SS/
.
E.g. s3://my-bucket/timestamp=2021-12-12 12:19:27/
, however MM:SS part are not predictable, and I am interested in reading the data for a given hour. I tried the following:
df = spark.read.parquet("s3://my-bucket/timestamp=2021-12-12 12:*:*/")
df = spark.read.parquet("s3://my-bucket/timestamp=2021-12-12 12:[00,01-59]:[00,01-59]/")
but they give the error pyspark.sql.utils.IllegalArgumentException: java.net.URISyntaxException
.
The problem is your path contains colons :
. Unfortunately, it is still not supported. Here are some related tickets:
and threads:
I think the only way is rename these files...