Search code examples
rcsvfwrite

Categorical fields being converted to continuous after exporting csv using fwrite


I am facing some troubles with exporting a dataframe in R to csv as it seems to be converting my factors into numerics. Using summary() before exporting, I get the following:

 JobLevel JobSatisfaction
 1:1880   1:1448         
 2:3134   2:1343         
 3:1307   3:1996         
 4: 545   4:2327         
          5: 248               

Then, I exported the file to CSV using the following command:

fwrite(HR, file = "Cleaned Data.csv")

However, when I imported the csv later, the categorical columns have seemingly been converted to continuous as such:

HR2 <- fread("Cleaned Data.csv", na.strings = "", stringsAsFactors = TRUE)
    JobLevel     JobSatisfaction
 Min.   :1.000   Min.   :1.000
 1st Qu.:1.000   1st Qu.:2.000
 Median :2.000   Median :3.000 
 Mean   :2.177   Mean   :2.731                  
 3rd Qu.:3.000   3rd Qu.:4.000                  
 Max.   :5.000   Max.   :4.000  

I believe gender is fine as it is a string but is there a way for me to export my factors with numeric levels such that when the csv is imported later, it would still remain as a factor.

Many thanks in advance!


Solution

  • CSV is a generic file format that is just Comma Separated Values. It doesn't contain any information about the classes of columns - that's up to the function that reads the CSV to decide.

    To preserve class information when writing to a file the easiest way is to use an R-specific file format, like RDS (see ?readRDS and ?saveRDS). This works great if you only need R to read the file.

    If you need other programs to be able to read/write the data too, then you'll need to keep track of the class information and, e.g., use the colClasses argument of fread to specify the column classes when you read in the CSV.