Search code examples
rapache-sparksparkr

Error in using grep in SparkR


I am having an issue with subsetting my Spark DataFrame.

I have a DataFrame called nfe, which contains a column called ITEM_PRODUTO that is formatted as a string. I would like to subset this DataFrame based on whether the item column contains the word "AREIA". I can easily subset the data based on an exact phrase:

nfe.subset1 <- subset(nfe, nfe$ITEM_PRODUTO == "AREIA LAVADA FINA")

nfe.subset2 <- subset(nfe, nfe$ITEM_PRODUTO %in% "AREIA")

However, what I would like is a subset of all rows that contain the word "AREIA" in the ITEM_PRODUTO column. When I try to use grep, though, I receive an error message:

nfe.subset3 <- subset(nfe, grep("AREIA", nfe$ITEM_PRODUTO))

# Error in as.character.default(x) : 
#  no method for coercing this S4 class to a vector

I've tried multiple iterations of syntax, and tried grepl as well, but nothing seems to work. It's probably a syntax error, but could anyone help me out?

Thanks!


Solution

  • Standard R functions cannot be applied to SparkDataFrame. Use either like`:

    where(nfe, like(nfe$ITEM_PRODUTO, "%AREIA%"))
    

    or rlike:

    where(nfe, rlike(nfe$ITEM_PRODUTO, ".*AREIA.*"))