Search code examples
watson-studioproject-lib

How to register file/folder as a project data asset after saving to cloud object storage?


I've saved a spark data frame to cloud object storage into a Watson Studio project's bucket:

staging     
  .write             
  .mode("overwrite") 
  .option("header", "true")
  .csv(cos.url('all.csv', 'myproject-bucket'))

I would the resulting folder to be show up in the project assets.

Initially, I tried using project-lib but from the documentation it appears that you have to have a file like object which means bringing all the data back to the driver node and if I do that, I run out of memory.


Solution

  • You can create a connection from Watson Studio to COS and publish the files.

    Steps

    1. In Watson Studio interface, goto "Add to project" -> Connection
    2. Create a connection for "Cloud Object Storage". You would need the credentials for the COS bucket.
    3. You can choose to check "Discover Data Assets". It would all the files in the bucket to your project assets. You can publish the assets from there.
    4. If you didn't choose "Discover Data Assets", then you can manually choose the files. goto "Add to project" -> Connected Data and use the connection you created earlier which will list out the files from which you can pick.