I'm trying to load a huge (~5GB) .csv file into R using read.csv.ffdf. The command goes:
npi <- read.csv.ffdf(file="C:/Users/DSA/Dropbox/Team Shared Files/People/Ross/NPI_Parse/Zips/npi_full.csv", VERBOSE=TRUE, first.rows=10000,next.rows=100000,colClasses=NA)
The command runs for a while and then throws the following error: "no applicable method for 'recodeLevels' applied to an object of class "c('double', 'numeric')." Some searching tells me I need to use the transFUN option but I have no idea how to apply it. The data is both text and numbers and I think that may be causing issues. I can upload a screenshot of the csv if it helps but it takes ages to open in LibreOffice.
Anyone know any tricks?
From the documentation of read.csv.ffdf
.
transFUN: NULL or a function that is called on each data.frame chunk after reading with FUN and before further processing (for filtering, transformations etc.)
If one of your columns changes from being a factor to a numeric or vice versa, make sure it is a factor using transFUN
npi <- read.csv.ffdf(
file="C:/Users/DSA/Dropbox/Team Shared Files/People/Ross/NPI_Parse/Zips/npi_full.csv",
VERBOSE=TRUE, first.rows=10000,next.rows=100000,
transFUN=function(x){
x$yourcolumnwiththeerror <- factor(as.character(x$yourcolumnwiththeerror))
x
})