Search code examples
rapache-sparksparkr

SparkR distinct (on databricks)


I am new to SparkR, so please forgive if my question is very basic.

I work on databricks and try to get all unique dates of a column of a SparkDataFrame.

When I run:

uniquedays <- SparkR::distinct(df$datadate)

I get the error message:

unable to find an inherited method for function ‘distinct’ for signature ‘"Column"’

On Stack Overflow, I found out that this usually means (If I run isS4(df), it returns TRUE):

That is the type of message you will get when attempting to apply an S4 generic function to an object of a class for which no defined S4 method exists

I also tried to run

uniquedays <- SparkR::unique(df$datadate)

where I get the error message:

unique() applies only to vectors

It feels like, I am missing something basic here. Thank you for your help!


Solution

  • Try this:

    library(magrittr)
    uniquedays <- SparkR::select(df, df$datadate) %>% SparkR::distinct()