I have two integer columns (x1
and x2
) in a SparkR
DataFrame
named df
that are very similar to each other. I want to get a count of how many of the values match and compare it with the total length of the columns. How can I do this? I have tried the following, both of which result in errors.
agg(df, sum(df$x1==df$x2))
collect(sum(df$x1==df$x2))
Specifically, here's the code to the answer:
df <- withColumn(df, 'x', df$x1==df$x2)
head(agg(groupBy(df, 'x'), x="count"))