Search code examples
ramazon-web-servicesamazon-s3r-pawsaws.s3

s3sync() Exclude Directory


I'm trying to pull down all files in a given bucket, except those in a specific directory, using R.

In the aws cli, I can use...
aws s3 sync s3://my_bucket/my_prefix ./my_destination --exclude="*bad_directory*"

In aws.s3::s3sync(), I'd like to do something like...
aws.s3::s3sync(path='./my_destination', bucket='my_bucket', prefix='my_prefix', direction='download', exclude='*bad_directory*')
...but exclude is not a supported argument.

Is this possible using aws.s3 (or paws for that matter)?

Please don't recommend using aws cli - there are reasons that approach doesn't make sense for my purpose.

Thank you!!


Solution

  • Here's what I came up with to solve this...

    library(paws)
    library(aws.s3)
    
    s3 <- paws::s3()
    contents <- s3$list_objects(Bucket='my_bucket',Prefix='my_prefix/')$Contents
          
    keys <- unlist(sapply(contents,FUN=function(x){
        if(!grepl('/bad_directory/',x$Key,fixed=TRUE)){
            x$Key
        }
    }))
          
    for(i in keys){
        dir.create(dirname(i),showWarnings=FALSE,recursive=TRUE)
            
        aws.s3::save_object(
            object = i,
            bucket='my_bucket',
            file = i
        )
    }
    

    Still open to more efficient implementations - thanks!