Search code examples
rapache-sparkapache-spark-sqlsparkr

Drop a DataFrame's Column in SparkR


I'm wondering if there is a concise method for dropping a DataFrame's column in SparkR, such as df.drop("column_name") in pyspark.

This is the closest I can get:

df <- new("DataFrame",
          sdf=SparkR:::callJMethod(df@sdf, "drop", "column_name"),
          isCached=FALSE)

Solution

  • This can be achieved by assigning NULL to the Spark dataframe column:

    df$column_name <- NULL
    

    See the original discussion at the related Spark JIRA ticket.