I struggled with this for a while and wanted to share my solution. AvroParquetReader is a fine tool for reading Parquet, but its defaults for S3 access are weak:
java.io.InterruptedIOException: doesBucketExist on MY_BUCKET: com.amazonaws.AmazonClientException: No AWS Credentials provided by BasicAWSCredentialsProvider EnvironmentVariableCredentialsProvider SharedInstanceProfileCredentialsProvider : com.amazonaws.AmazonClientException: Unable to load credentials from service endpoint
I want to use credentials providers akin to those used by com.amazonaws.auth.profile.ProfileCredentialsProvider, which works for accessing my S3 bucket, but it is not clear from AvroParquetReader's class definition or documentation how I would achieve this.
This code worked for me. It allowed AvroParquetReader to access S3 using ProfileCredentialsProvider.
import com.amazonaws.auth.AWSCredentialsProvider;
import com.amazonaws.auth.profile.ProfileCredentialsProvider;
import org.apache.parquet.avro.AvroParquetReader;
import org.apache.parquet.hadoop.ParquetReader;
import org.apache.hadoop.fs.Path;
import org.apache.avro.generic.GenericRecord;
import org.apache.hadoop.conf.Configuration;
...
final String path = "s3a://"+bucketName+"/"+pathName;
final Configuration configuration = new Configuration();
configuration.setClass("fs.s3a.aws.credentials.provider", ProfileCredentialsProvider.class,
AWSCredentialsProvider.class);
ParquetReader<GenericRecord> parquetReader =
AvroParquetReader.<GenericRecord>builder(new Path(path)).withConf(configuration).build();