Search code examples
scalaibm-cloudobject-storagedata-science-experience

put_file() function in Scala?


Is there an equivalent method to the R/Python put_file() methods for taking an object from a Scala notebook in DSX and saving it as a data asset for the project? If so is there any documentation? Looking for something like what was outlined in this article:
https://datascience.ibm.com/blog/working-with-object-storage-in-data-science-experience-python-edition/
I have already written the csv file I want within the notebook, just need to save it to the project!


Solution

  • Try following steps and code snippets -

    Step 1 : First generate the credentials. You should be able to generate it by clicking (for any file already uploaded from your browser) the 'Insert to Code->Insert Spark Session Dataframe' from File tab of 'File and Add Data' pane in DSX.

    def setHadoopConfig2db1c1ff193345c28eaffb250b92d92b(name: String) = {
    
        val prefix = "fs.swift.service." + name
        sc.hadoopConfiguration.set(prefix + ".auth.url", "https://identity.open.softlayer.com" + "/v3/auth/tokens")
        sc.hadoopConfiguration.set(prefix + ".auth.endpoint.prefix","endpoints")
        sc.hadoopConfiguration.set(prefix + ".tenant", "<tenant id>")
        sc.hadoopConfiguration.set(prefix + ".username", "<userid>")
        sc.hadoopConfiguration.set(prefix + ".password", "<password.")
        sc.hadoopConfiguration.setInt(prefix + ".http.port", 8080)
        sc.hadoopConfiguration.set(prefix + ".region", "dallas")
        sc.hadoopConfiguration.setBoolean(prefix + ".public", false)
    }
    
    val name = "keystone"
    setHadoopConfig2db1c1ff193345c28eaffb250b92d92b(name)
    
    val data_frame1 = spark.read.option("header","true").csv("swift://'Your 
    DSXProjectName'.keystone/<your file name>.csv")
    

    Step 2 : some code which creates data_frame2 from data_frame1 after say some transformation

    Step 3 : Use the same container and project name while saving data of data_frame2 to a file in object store

    data_frame2.write.option("header","true").csv("swift://'Same DSXproject name as before'.keystone/<name of the file u want to write the data>.csv")
    

    Please note that you can generate the credential in step 1 and can use it for saving any dataframe in your current notebook without even reading data from any file.