Search code examples
rtidyverse

Label levels of a character variable in r


I want to create a new variable with labels of a character variable in a preceding column such that 6 = "6) Hazardous", 5 = 5) Very unhealthy", 4 = 4) Unhealthy etc

I have tried other options but I keep getting NA's

data <- structure(list(Year = c(2021, 2021, 2021, 2021, 2021, 2021, 2021, 
2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 2021, 
2021, 2021, 2021, 2021), Month = c(1, 1, 2, 2, 3, 3, 4, 5, 5, 
6, 6, 7, 7, 8, 8, 9, 9, 10, 10, 11, 12, 12), Day = c(1, 17, 3, 
23, 12, 29, 14, 1, 18, 3, 20, 7, 23, 9, 26, 11, 28, 15, 31, 17, 
4, 21), Hour = c(2, 18, 10, 18, 10, 2, 18, 10, 2, 18, 10, 2, 
18, 10, 2, 18, 10, 2, 18, 16, 8, 0), AQI = c(407, 356, 278, 107, 
154, 215, 100, 171, 69, 167, 68, 135, 76, 128, 76, 57, 151, 168, 
157, 179, 264, 256), AQI_desc = c("6) Hazardous", "6) Hazardous", 
"5) Very unhealthy", "3) Unhealth for sens. groups", "4) Unhealthy", 
"5) Very unhealthy", "2) Moderate", "4) Unhealthy", "2) Moderate", 
"4) Unhealthy", "2) Moderate", "3) Unhealth for sens. groups", 
"2) Moderate", "3) Unhealth for sens. groups", "2) Moderate", 
"2) Moderate", "4) Unhealthy", "4) Unhealthy", "4) Unhealthy", 
"4) Unhealthy", "5) Very unhealthy", "5) Very unhealthy")), row.names = c(NA, 
-22L), class = c("tbl_df", "tbl", "data.frame"))

char_to_num <- list("Good " = 1, "Moderate" = 2, "Unhealth for sens. groups" = 3,"Unhealthy  " = 4, "Very unhealthy" = 5, "Hazardous" = 6)

Solution

  • Since the numbering is in the data itself, you can extract it using parse_number:

    data %>% 
      mutate(AQI_desc_new = parse_number(AQI_desc))
    
    # A tibble: 22 × 7
        Year Month   Day  Hour   AQI AQI_desc                     AQI_desc_new
       <dbl> <dbl> <dbl> <dbl> <dbl> <chr>                               <dbl>
     1  2021     1     1     2   407 6) Hazardous                            6
     2  2021     1    17    18   356 6) Hazardous                            6
     3  2021     2     3    10   278 5) Very unhealthy                       5
     4  2021     2    23    18   107 3) Unhealth for sens. groups            3
     5  2021     3    12    10   154 4) Unhealthy                            4
     6  2021     3    29     2   215 5) Very unhealthy                       5
     7  2021     4    14    18   100 2) Moderate                             2
     8  2021     5     1    10   171 4) Unhealthy                            4
     9  2021     5    18     2    69 2) Moderate                             2
    10  2021     6     3    18   167 4) Unhealthy                            4