Search code examples
rshinyopenair

Clean bad data automatically


I am building an App using shiny and openair to analyze wind data.
Right now the data needs to be “cleaned” before uploading by the user. I am interested in doing this automatically. Some of the data is empty, some of is not numeric, so it is not possible to build a wind rose. I want to:

    1. Estimate how much of the data is not numeric
    2. Cut it out and leave only numeric data

here is an example of the data:
the "NO2.mg" is read as a factor and not int becuse it does not consist only numbers
OK
here is a reproducible example:

no2<-factor(c(5,4,"c1",54,"c5",seq(2:50)))
no2
[1] 5  4  c1 54 c5 1  2  3  4  5  6  7  8  9  10 11 12 13 14
[20] 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33
[39] 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49
52 Levels: 1 10 11 12 13 14 15 16 17 18 19 2 20 21 22 ... c5
> as.numeric(no2)
[1] 45 34 51 46 52  1 12 23 34 45 47 48 49 50  2  3  4  5  6
[20]  7  8  9 10 11 13 14 15 16 17 18 19 20 21 22 24 25 26 27
[39] 28 29 30 31 32 33 35 36 37 38 39 40 41 42 43 44

Solution

  • To convert a factor to numeric, you need to convert to character first:

    no2<-factor(c(5,4,"c1",54,"c5",seq(2:50)))
    no2_num <- as.numeric(as.character(no2)) 
    #Warning message:
    #  NAs introduced by coercion 
    no2_clean <- na.omit(no2_num) #remove NAs resulting from the bad data
    
    # [1]  5  4 54  1  2  3  4  5  6  7  8  9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36
    # [40] 37 38 39 40 41 42 43 44 45 46 47 48 49
    # attr(,"na.action")
    # [1] 3 5
    # attr(,"class")
    # [1] "omit"
    
    length(attr(no2_clean,"na.action"))/length(no2)*100
    #[1] 3.703704