I bring in a data set using the following command:
rbc <- read.csv("rbc hgb.csv", header = T)
data <- rbc[rbc$Result_Value_After != "NULL",]
For some reason the rbc$Result_Value_After
gets treated like a factor, so I issue the following command:
data$Result_Value_After <- as.numeric(data$Result_Value_After)
The str(data) tells me the column is now of type num
but all numbers that were factors are in decimal form like 7.2. When I do the conversion it gets changed to 72, which is way off. Any ideas on how to go about fixing this?
Here's a possible workaround for the issue of column classification upon calling read.csv
.
Say I don't want to mess around with changing classes after reading data into R. If I want one column to be character
and the others as the default class, I can use readLines
to quickly read the first line of the .csv (i.e. the column header line, if present) and set up a vector to be passed to the colClasses
argument of read.csv
.
Here's a simple function,
col.classes <- function(csv, col, class){
g <- readLines(csv, n = 1)
n <- unlist(strsplit(g, ","))
col.classes <- ifelse(n %in% col, class, NA)
return(col.classes)
}
To show how this works, suppose I have a file named cats.csv (and it just so happens that I do), and I know I want the weight column to be class character
and the rest of the columns as the default class. Keep in mind that colClasses
can be a character vector, and for elements that are NA
, the corresponding column of data is skipped and classed as if read without colClasses
.
View the names of the columns in the file
names(read.csv('cats.csv'))
## [1] "cats" "colour" "length" "weight" "mu"
View the default classes from read.csv
> sapply(read.csv('cats.csv'), class)
## cats colour length weight mu
## "integer" "factor" "integer" "integer" "integer"
Sample Runs:
(1) Class the length column as numeric
upon calling read.csv
, while leaving others as their respective defaults
> cc1 <- col.classes('cats.csv', 'length', 'numeric')
> rr1 <- read.csv('cats.csv', colClasses = cc1)
> sapply(rr1, class)
## cats colour length weight mu
## "integer" "factor" "numeric" "integer" "integer"
(2) Similarly, class the weight column as character
> cc2 <- col.classes('cats.csv', 'weight', 'character')
> rr2 <- read.csv('cats.csv', colClasses = cc2)
> sapply(rr2, class)
## cats colour length weight mu
## "integer" "factor" "integer" "character" "integer"
Not sure if that helps you at all. I find it useful when I want a mixture of column classes that might otherwise be clunky and frustrating to change once the data has already been loaded into R.