SparkR. How to count distinct values for all columns in a Spark DataFrame?

I am wondering if there is a way to count the number of distinct items in each column of a spark dataframe? That is, given this dataset:

set.seed(123)
df<- data.frame(ColA=rep(c("dog", "cat", "fish", "shark"), 4), ColB=rnorm(16), ColC=rep(seq(1:8),2))
df

I do this in R to get the counts:

sapply(df, function(x){length(unique(x))} )

> ColA ColB ColC 
   4   16    8

How would I go about doing the same thing for this Spark DataFrame?

sdf<- SparkR::createDataFrame(df)

Any help is greatly appreciated. Thank you in advance. -nate

Solution

This works for me in SparkR:

exprs = lapply(names(sdf), function(x) alias(countDistinct(sdf[[x]]), x))
# here use do.call to splice the aggregation expressions to agg function
head(do.call(agg, c(x = sdf, exprs)))

#  ColA ColB ColC
#1    4   16    8

Rcpp Rf_warningcall compiler warnings
Modify the name of factor variables in lm function(summary function)
Extreme value analysis and quantile estimation using log Pearson type 3 (Pearson III) distribution - R vs Python
How to hide NAs when using xlsx::saveWorkbook?
How do I retrieve a simple numeric value from a named numeric vector in R?
Matching pair-wise columns from left to right across rows in one dataframe to another dataframe and adding new columns with matching values
Income to outcome flow chart in Sankey plotly R
color mapping in geom_conn_bundle not showing correctly
Print R package startup message AFTER automatic package conflict messages instead of before
Summing a set of R dataframe rows (column-wise), while retaining the first n columns
Added variable / partial regression plots for groups in an interaction?
how to make a topoplot in R with coordinates variable distribution
List of all functions in base R?
Plotting multiple plots for different initial conditions in one graph
Printing repetitively on the same line in R
Generating UI/Server based on initial selection
Subset dataframe based on pickerInput
How to let user pick the data in R-shiny?
Couldn't show my simple bar charts separately on Shiny R dashboardBody
How to programmatically filter contents of a second shiny app displayed via iframe
How to select specific interesting groups for the boxplot in R Shiny app?
Crosstable and Plot grouping with reactive values
Is there a way to make multiple Shiny picker inputs where the selections must be disjoint?
Delay/avoid duplication of shiny server side functions until after credentials
Predictions only returns value "1"
How to display a busy indicator in a shiny app?
Append doesn't work when writing to CSV in R
Changing the start date of a gantt chart in DiagrammeR
Check for installed packages before running install.packages()
Compare two columns element-wise