Search code examples
rexcelcharacternumericdataformat

cannot convert character into numeric in R


I copy and pasted the weather information from the following website "weather underground" for some data analysis and the data looks like below:

https://www.wunderground.com/dashboard/pws/KCACHINO13/table/2018-04-10/2018-04-10/daily

enter image description here

As you can see, the temperature and other information all have the text with it so I cannot conduct any calculation. In the excel, I used substitute(xx,"F","") to remove the F from the "Temperature" column, but then I wanted to convert Farenheit to Celcius using convert(xx,"F","C"), I could not get the outcome. I think there is something wrong with the data itself. I formatted the cell into number or copy and paste the value to a new column, but neither of them worked.

Then I import the data.frame into R and try to do some data formating using R. I checked the class of the Temperature column, which shows "character":

class(a$Temperature)
#"character"

a$Temperature <- gsub("F","",a$Temperature)
# this command remmoved "F"

as.numeric(a$Temperature)
#Warning message: NAs introduced by coercion 

as.numeric(unlist(a$Temperature))
#still the same warning message

From the excel, I created the new column removing "F" from temperature, and used this in R to convert "character" to "numeric", I still got the warning message. I don't know how to deal with this problem. Could someone help me with this? Thank you!

As recommended below, I am pasting the output from

dput(head(a))

#structure(list(Time = structure(c(-2209075140, -2209074840, -2209074540, 
-2209074240, -2209073940, -2209073640), tzone = "UTC", class = c("POSIXct", 
"POSIXt")), Temperature = c("60.0 ", "59.9 ", "59.8 ", "59.7 ", 
"59.6 ", "59.5 "), `T(F)` = c("60.0 ", "59.9 ", "59.8 ", "59.7 ", 
"59.6 ", "59.5 "), `Dew Point` = c("48.2 F", "48.1 F", "48.4 F", 
"48.3 F", "48.2 F", "48.1 F"), Humidity = c("65 %", "65 %", "66 %", 
"66 %", "66 %", "66 %"), Wind = c("WSW", "WSW", "WSW", "WSW", 
"WSW", "WSW"), Speed = c("0.0 mph", "0.0 mph", "0.0 mph", "0.0 mph", 
"0.0 mph", "0.0 mph"), Gust = c("0.0 mph", "0.0 mph", "0.0 mph", 
"0.0 mph", "0.0 mph", "0.0 mph"), Pressure = c("29.88 in", "29.88 in", 
"29.88 in", "29.88 in", "29.88 in", "29.88 in"), `Precip. Rate.` = c("0.00 in", 
"0.00 in", "0.00 in", "0.00 in", "0.00 in", "0.00 in"), `Precip. Accum.` = c("0.00 in", 
"0.00 in", "0.00 in", "0.00 in", "0.00 in", "0.00 in"), UV = c(0, 
0, 0, 0, 0, 0), Solar = c("0 w/m²", "0 w/m²", "0 w/m²", "0 w/m²", 
"0 w/m²", "0 w/m²")), row.names = c(NA, -6L), class = c("tbl_df", 
"tbl", "data.frame"))

Solution

  • If you want to convert only Temperature column, here is an option you may consider.

    Data

    df <- structure(list(Time = c("12:04 AM", "12:09 AM", "12:14 AM", "12:19 AM", 
    "12:24 AM", "12:29 AM"), Temperature = c("69.4 F", "69.2 F", 
    "68.8 F", "68.5 F", "68.3 F", "68.0 F"), Dew.Point = c("45.9 F", 
    "46.0 F", "45.8 F", "45.7 F", "45.7 F", "45.7 F"), Humidity = c("43 %", 
    "43 %", "44 %", "44 %", "44 %", "45 %"), Wind = c("NE", "NE", 
    "NE", "NE", "NE", "NE"), Speed = c("0.0 mph", "0.0 mph", "0.0 mph", 
    "0.0 mph", "0.0 mph", "0.0 mph"), Gust = c("0.0 mph", "0.0 mph", 
    "0.0 mph", "0.0 mph", "0.0 mph", "0.0 mph"), Pressure = c("29.93 in", 
    "29.94 in", "29.94 in", "29.95 in", "29.95 in", "29.95 in"), 
        Precip..Rate. = c("0.00 in", "0.00 in", "0.00 in", "0.00 in", 
        "0.00 in", "0.00 in"), Precip..Accum. = c("0.00 in", "0.00 in", 
        "0.00 in", "0.00 in", "0.00 in", "0.00 in"), UV = c(0L, 0L, 
        0L, 0L, 0L, 0L), Solar = c("0 w/m²", "0 w/m²", "0 w/m²", 
        "0 w/m²", "0 w/m²", "0 w/m²")), class = "data.frame", row.names = c(NA, 
    -6L))
    

    Code

    library(dplyr)
    library(stringr)
    df2 <- df %>% 
      mutate(Temperature2 = as.numeric(str_extract(Temperature, "[\\d\\.]+"))) %>% 
      relocate(Temperature2, .after = Temperature)
    
    df2[, 2:3]
    #   Temperature Temperature2
    # 1      69.4 F         69.4
    # 2      69.2 F         69.2
    # 3      68.8 F         68.8
    # 4      68.5 F         68.5
    # 5      68.3 F         68.3
    # 6      68.0 F         68.0
    str(df2$Temperature2)
    # num [1:6] 69.4 69.2 68.8 68.5 68.3 68