Search code examples
rlapplysapply

How to convert all numeric columns stored as character to numeric in a dataframe?


I have a dataframe with hundreds of columns where some columns despite having only numeric values are stored as character data type. I need to convert all the columns to numeric where values are numbers only (there might also be NAs in the data).

Example dataframe:

df <- data.frame(id = c("R1","R2","R3","R4","R5"), name=c("A","B","C","D","E"), age=c("24", "NA", "55", "19", "40"), sbp=c(174, 125, 180, NA, 130), dbp=c("106", "67", "109", "NA", "87"))

> print(df, row.names = F)
 id name age sbp dbp
 R1    A  24 174 106
 R2    B  NA 125  67
 R3    C  55 180 109
 R4    D  19  NA  NA
 R5    E  40 130  87

These columns should be numeric.
> df$age
[1] "24" "NA" "55" "19" "40"
> df$dbp
[1] "106" "67"  "109" "NA"  "87" 

I applied as.numeric() function but it also converted all the character varaibles (id, name..etc) to numeric thus the NA generated.

> sapply(df,as.numeric)
     id name age sbp dbp
[1,] NA   NA  24 174 106
[2,] NA   NA  NA 125  67
[3,] NA   NA  55 180 109
[4,] NA   NA  19  NA  NA
[5,] NA   NA  40 130  87

> lapply(df,as.numeric)
$id
[1] NA NA NA NA NA

$name
[1] NA NA NA NA NA

$age
[1] 24 NA 55 19 40

$sbp
[1] 174 125 180  NA 130

$dbp
[1] 106  67 109  NA  87

What I need to do is ignoreing the real character colums (id, names..) while looping through the dataframe. Any help is much appreciated!


Solution

  • Try type.convert():

    df2 <- type.convert(df, as.is = TRUE)
    

    Result:

    #> df2
      id name age sbp dbp
    1 R1    A  24 174 106
    2 R2    B  NA 125  67
    3 R3    C  55 180 109
    4 R4    D  19  NA  NA
    5 R5    E  40 130  87
    
    ## check column classes
    #> sapply(df2, class)
             id        name         age         sbp         dbp 
    "character" "character"   "integer"   "integer"   "integer" 
    

    Note, the as.is argument controls whether character columns are converted to factors. i.e., if as.is= FALSE, the first two columns would have been changed to factors.