I am taking my first steps in the Azure Databricks
world and therefore I have to learn how to use SparkR
.
[I am coming from data.table
]
Although I have read a lot of documentation, I think something escapes me on SparkDataFrame.
To create a new column, I learned that we can do something like :
sdf$new <- sdf$old * 0.5
But if I want to use a basic function, I got an error and I can't figure out why :
sdf <- sql("select * from database.table")
sdf$new <- strsplit(sdf$old, "-")[1]
Error in strsplit((sdf$old), "-") :
non-character argument
Some(<code style = 'font-size:10p'> Error in strsplit((sdf$old), "-"): non-character argument </code>)
What am I missing ?
Thanks.
Instead of strsplit
you need to use Spark specific functions that you can find in the Spark R API documentation. Specifically, you need to use split_string
function, combined with getItem
function (please note that you need to specify L
to force number be an integer):
new_df <- withColumn(sdf, "new_id", getItem(split_string(sdf$old, ","), 0L))