Search code examples
rdplyrfactorsmutate

Using R dplyr mutate to create a factor column using pre-declared levels


I've written R code to produce a periodic report that requires re-ordering of Week numbers such that I can filter and order by the most recent 10 weeks. To prevent errors and minimize hard-coded values, I prefer to declare this week order at the top of the script that sources the other several scripts used. Thus, I would like to define an ordered factor list and then use it to order the week number column later. RepEx below, but generally I am reordering all 52 weeks such that the most recent 10-week-period is last/largest, e.g. new_levels <- factor(1:52, levels = c(29:52, 1:28), ordered=TRUE).

Side note: any advise on how better to handle grabbing the most recent (not necessarily greatest) 10-week period is welcomed. My struggle in the past is due to the roll-over near the end of the year (51, 52, 1, 2, 3,...).

Example:

new_levels <- factor(1:10, levels = c(8:10, 1:7), ordered=TRUE)

data <- tibble(Week = 1:10, ID = c("A","A","B","B","C","A","D","B","D","A"))

data <- data %>% mutate(Week2 = factor(Week, levels = new_levels, ordered = TRUE)) %>% arrange(Week2)

The ordered factor (new_levels) appears to be correct, but the behavior of arrange() and str() show that the ordering I want is not happening:

> new_levels
 [1] 1  2  3  4  5  6  7  8  9  10
Levels: 8 < 9 < 10 < 1 < 2 < 3 < 4 < 5 < 6 < 7
> data
# A tibble: 10 × 3
    Week ID    Week2
   <int> <chr> <ord>
 1     1 A     1    
 2     2 A     2    
 3     3 B     3    
 4     4 B     4    
 5     5 C     5    
 6     6 A     6    
 7     7 D     7    
 8     8 B     8    
 9     9 D     9    
10    10 A     10   
> str(data)
tibble [10 × 3] (S3: tbl_df/tbl/data.frame)
 $ Week : int [1:10] 1 2 3 4 5 6 7 8 9 10
 $ ID   : chr [1:10] "A" "A" "B" "B" ...
 $ Week2: Ord.factor w/ 10 levels "1"<"2"<"3"<"4"<..: 1 2 3 4 5 6 7 8 9 10

Thank you!


Solution

  • If you look closer at your output, you will see you are not doing what you are expecting:

    data %>% 
      mutate(Week2 = factor(Week, levels = new_levels, ordered = TRUE)) %>% 
      pull(Week2)
    #  [1] 1  2  3  4  5  6  7  8  9  10
    # Levels: 1 < 2 < 3 < 4 < 5 < 6 < 7 < 8 < 9 < 10
    

    This shows that arrange is working as expected. The issue comes from the fact that you are assigning levels = new_levels. What is the value of new_levels?

    new_levels
    #  [1] 1  2  3  4  5  6  7  8  9  10
    # Levels: 8 < 9 < 10 < 1 < 2 < 3 < 4 < 5 < 6 < 7
    

    In this case it is a sequence of 1:10. What you want is to assign the levels of new_levels to the levels of your new variable:

    data %>% 
      mutate(Week2 = factor(Week, levels = levels(new_levels), ordered = TRUE)) %>% 
      arrange(Week2)
    #     Week ID    Week2
    #    <int> <chr> <ord>
    #  1     8 B     8    
    #  2     9 D     9    
    #  3    10 A     10   
    #  4     1 A     1    
    #  5     2 A     2    
    #  6     3 B     3    
    #  7     4 B     4    
    #  8     5 C     5    
    #  9     6 A     6    
    # 10     7 D     7