I have a dataframe with hundreds of columns where some columns despite having only numeric values are stored as character data type. I need to convert all the columns to numeric where values are numbers only (there might also be NAs in the data).
Example dataframe:
df <- data.frame(id = c("R1","R2","R3","R4","R5"), name=c("A","B","C","D","E"), age=c("24", "NA", "55", "19", "40"), sbp=c(174, 125, 180, NA, 130), dbp=c("106", "67", "109", "NA", "87"))
> print(df, row.names = F)
id name age sbp dbp
R1 A 24 174 106
R2 B NA 125 67
R3 C 55 180 109
R4 D 19 NA NA
R5 E 40 130 87
These columns should be numeric.
> df$age
[1] "24" "NA" "55" "19" "40"
> df$dbp
[1] "106" "67" "109" "NA" "87"
I applied as.numeric() function but it also converted all the character varaibles (id, name..etc) to numeric thus the NA generated.
> sapply(df,as.numeric)
id name age sbp dbp
[1,] NA NA 24 174 106
[2,] NA NA NA 125 67
[3,] NA NA 55 180 109
[4,] NA NA 19 NA NA
[5,] NA NA 40 130 87
> lapply(df,as.numeric)
$id
[1] NA NA NA NA NA
$name
[1] NA NA NA NA NA
$age
[1] 24 NA 55 19 40
$sbp
[1] 174 125 180 NA 130
$dbp
[1] 106 67 109 NA 87
What I need to do is ignoreing the real character colums (id, names..) while looping through the dataframe. Any help is much appreciated!
Try type.convert()
:
df2 <- type.convert(df, as.is = TRUE)
Result:
#> df2
id name age sbp dbp
1 R1 A 24 174 106
2 R2 B NA 125 67
3 R3 C 55 180 109
4 R4 D 19 NA NA
5 R5 E 40 130 87
## check column classes
#> sapply(df2, class)
id name age sbp dbp
"character" "character" "integer" "integer" "integer"
Note, the as.is
argument controls whether character columns are converted to factors. i.e., if as.is= FALSE
, the first two columns would have been changed to factors.