I am using DuckDB in DuckDB-WASM. I am creating a view on top of a hive-partitioned table in S3 with SQL like:
create or replace view my_view as
select
Part1 as part_1
, Part2 as part_2
, Column1 as column_1
, Column2 as column_2
from read_parquet(
[
's3://my-bucket/path/to/part1=abc/part2=123/000.parquet',
's3://my-bucket/path/to/part1=def/part2=456/000.parquet',
's3://my-bucket/path/to/part1=ghi/part2=789/000.parquet'
],
hive_partitioning=1)
I then execute a query like:
select count(*) from my_view where part1 = 'abc' and part2 = '123'
I'm expecting DuckDB to use a pushdown predicate to read only the s3://my-bucket/path/to/part1=abc/part2=123/000.parquet
file. Instead, I see that the Chrome debug tools network tab reads all three files.
I believe I figured it out. S3 prefixes are case-sensitive. Changing the view to this seems to have fixed it for me.
create or replace view my_view as
select
part1 as part_1
, part2 as part_2
, Column1 as column_1
, Column2 as column_2
from read_parquet(
[
's3://my-bucket/path/to/part1=abc/part2=123/000.parquet',
's3://my-bucket/path/to/part1=def/part2=456/000.parquet',
's3://my-bucket/path/to/part1=ghi/part2=789/000.parquet'
],
hive_partitioning=1)