Search code examples
scalaapache-sparkgoogle-cloud-platformgoogle-cloud-storagemetadata

get GCS file metadata using scala


I want to get the time creation of files in GCS, I used the code below :

println(Files
  .getFileAttributeView(Paths.get("gs://datalake-dev/mu/tpu/file.0450138"), classOf[BasicFileAttributeView])
  .readAttributes.creationTime)

The problem is that the Paths.get function replace // with / so I will get gs:/datalake-dev/mu/tpu/file.0450138 instead of gs://datalake-dev/mu/tpu/file.0450138.

Anyone can help me with this ?

Thanks a lot !


Solution

  • I solved the problem by adding the following java code and then calling the java function in scala.

    import com.google.cloud.storage.*;
    import java.sql.Timestamp;
    
    public class ExtractDate {
        public static String getTime(String fileName){
            String bucketName = "bucket-data";
            String blobName = "doc/files/"+fileName;
            // Instantiates a client
            Storage storage_client = StorageOptions.getDefaultInstance().getService();
            Bucket bucket = storage_client.get(bucketName);
            //val storage_client = Storage.
            BlobId blobId = BlobId.of(bucketName, blobName);
            Blob blob =  storage_client.get(blobId);
            Timestamp tmp = new Timestamp(bucket.get(blobName).getCreateTime());
            System.out.print(bucket.get(blobName).getContent());
           // return the year of the file date creation 
            return tmp.toString().substring(0,4);
        }
    }