I'm looking for a cleaner way to check if a s3path is empty or not.
My current code looks like this,
if (!s3Path.isEmpty) {
try {
var rdd = sc.textFile(s3Path)
rdd.partitions.size
} catch {
case _: org.apache.hadoop.mapred.InvalidInputException =>
(sc.parallelize(List()))
}
}
I want to do it without creating an RDD.
I check s3path and see if its valid then I pass it to Spark to create RDD like below
public boolean checkIfS3PathsValid(String bucketName, String key)
{
try{
ObjectListing list = s3.listObjects(bucketName,key);
List<S3ObjectSummary> objectInfoList = list.getObjectSummaries();
if(objectInfoList.size()>0)
{
return true;
}
else
{
return false;
}
}
catch (Exception e)
{
e.printStackTrace();
return false;
}
}
here s3 is com.amazonaws.services.s3.AmazonS3 and you initialise it by
s3=AmazonS3Client(new PropertiesCredentials("path of your s3 credential file"));
So in you code call the checkIfS3PathsValid and see if it return true. If Yes , then only you create RDD using sc.textfile other wise you ignore that s3path.