Search code examples
google-cloud-storagegoogle-cloud-dataflowapache-beamapache-beam-iospotify-scio

How to get/add GCS File user-defined metadata using Apache Beam library [org.apache.beam.sdk.io.*]


I'm setting up a Dataflow pipeline, in which one of the action is to get/add the metadata[User-provided metadata] of a GCS file.

In a standalone java app I used below method to get the metadata which is from StorageObject class but not finding something similar method/api in Apache Beam library. Any pointers will be really helpful.

//Below code is from StorageObject.java
com.google.api.services.storage.model.StorageObject
//.....
public java.util.Map<String, java.lang.String> getMetadata() {
     return metadata;
}

Solution

  • I used following code to get the metadata from GCS and it works good in DataFlow pipeline.

    import com.google.cloud.storage.{BlobId, BlobInfo, Storage, StorageOptions}
    
    val storage: Storage = StorageOptions.getDefaultInstance.getService
    val blobId: BlobId = BlobId.of("bucket", "filename")
    val srcMap: Map[String, String] = storage.get(blobId).getMetadata