Search code examples
amazon-web-servicesscalaapache-sparkamazon-s3aws-glue

Glue Spark Scala Script to check if file exist in S3?


I am new in writing AWS Glue script and I would like to check if there's a way to check if a key, file, or pathname already exists in S3 bucket using Spark/Scala?

Thanks!


Solution

  • Yes, you can use a library like this, to check if a file exists in S3. You would have to upload the jar to S3 so you can reference it in your Glue Job as an external library.

    Another way would be to use the Filesystem.Get method like this:

    var sc = new SparkContext()
    if(FileSystem.get(URI.create("s3://s3bucket/"), sc.hadoopConfiguration).exists(new Path("s3://s3bucket/")))
    {
         println("File exists")
    }