How to filter S3 files as input for Amazon EMR?

I'm trying to run Amazon EMR Hadoop process that will process CloudFront logs in S3 bucket. Since CloudFront generates a lot of logs in the same bucket, how do I filter the log files without generating extra bandwidth for S3 access?

Solution

I found that I can use FileSystem.globStatus() to quickly filter files from CloudFront logs bucket:

FileSystem fs = new Path("s3://logs").getFileSystem(conf);
for (FileStatus fileStatus: fs.globStatus("s3://logs/prefix-2015-11-01*")) {
   if (fileStatus.isFile()) {
      FileInputFormat.addInputPath(myJob, fileStatus.getPath());
   }
}

Cannot Connect To AWS Elasticache Redis Cluster From Local Machine
How do i access AWS SAM-CLI through bash on windows?
Reporting AWS Tools RDS or Redshift?
What does "eksctl create iamserviceaccount" do under the hood on an EKS cluster?
Configure Selenium Nodes without a JSON file
How to bypass expectation of S3 server 100-contiune response in Boto3 put_object method
Deleting Perforce depot from AWS EC2 server does not free up space on EC2
Middy is not getting a secret from Secret Manager in a NodeJS AWS Lambda
Can I include a display name when sending email from the AWS SES Javascript v3 SDK?
Is it possible to add fields to struct in an existing AWS Athena table?
AWS SNS - Message missing a phone number
How to run a one-off task on AWS ECS Fargate?
AWS FIFO queues subscription with SNS: passing message group id
Is it necessary to add health-check config for ecs.CfnTaskDefinition.ContainerDefinitionProperty?
AWS CLI to get all matching CloudFronts using the "--query" option?
What is the best way to read a csv and text file from S3 on AWS glue without having to read it as a Dynamic daataframe?
Why is CloudFormation saying AlreadyExists when creating a AWS::ApiGateway::Authorizer
AWS sts assume role in one command
GetSecretValue, get identity: get credentials: failed to refresh cached credentials
Does PutACLAsync make a copy of an object?
How to use AWS CodeArtifact *within* A Dockerfile in AWSCodeBuild
AWS CodeBuild: Accessing CodeCommit repository in another account?
AWS Java SDK SSL Certificates
How can I use AWS_ACCESS_KEY_ID and AWS_SECRET_ACCESS_KEY to perform actions in AWS through a Jenkins pipeline?
Syntax error when executing Invoke-DDBQuery to fetch dynamodb record using Powershell
How can I delete a specific record from my AWS Glue table?
Start cfn-init in Ubuntu instance with cloudformation (yaml)
How to check ephemeral storage that is allocated to EKS node
Permission Error Running Container in AWS CodeBuild
retrieve list of urls from Amazon S3 bucket using R