Search code examples
rdplyrtidyverse

How to replace cur_dat() with pick()?


I understand cur_data() should be replaced by pick(). How would I rewrite this? mutate(normalized_mean_speed = mean_speed / cur_data()$mean_speed[1:3])

I wish to divide mean_speed for each year by the average of the first three years, in order to plot progress over the years.

The entire code for this part (that I suppose could be simplified to make it easier to read) is this:

data_raw |>
  group_by(event, 
           year= year(event_date)) |> 
  filter(year_rank > 0) |> 
  top_n(10, speed) |> 
  arrange(year, -speed) |> 
  select(year, year_rank, distance, speed, event) |> 
  group_by(event, 
           year) |> 
  filter(year >= 1990, event != "marathon" & event != "72h") |> 
  summarise(mean_speed = mean(speed)) |> 
  mutate(normalized_mean_speed = mean_speed / cur_data()$mean_speed[1:3]) |> 
  mutate(moving_average = rollmean(normalized_mean_speed, 
                                   k = 5, 
                                   na.pad = TRUE)) |> 
  mutate(event = fct_reorder(event, mean_speed)) |> 
  ggplot(aes(year, 
             normalized_mean_speed, 
             colour = fct_reorder(event, -mean_speed))) +
  geom_smooth(span = 0.15, se = F) +
  geom_line(aes(year, moving_average), 
            linetype = "dotted", 
            linewidth = 1) +
  geom_hline(yintercept=1, 
             linetype="dashed", 
             color = "grey", size=0.5) +
  labs(title = paste0("Average Top 10 Per Year - ", sex),
       subtitle = "normalized to 1990-1992 average",
       color = "Events",
       x = "Year",
       y = "Normalized Mean Speed") +
  scale_x_continuous(limits = c(1990, 2023),
                     breaks = seq(1990, 2023, 5),
                     expand = expansion(mult = c(0, 0.05))
                     ) +
  theme_ultra +
  theme(strip.text = element_text(size = 14)) +
  facet_wrap(~fct_reorder(event, -mean_speed), 
             shrink = TRUE
             )

Solution

  • As the docs state:

    pick() provides a way to select a subset of your columns using tidyselect. It returns a data frame... This is useful for functions that take data frames as inputs.

    In your example, you are calculating the normalized mean speed by sorting speed in descending order by group and taking the top three rows. Instead, you could use dplyr::slice_max(). This takes a data frame as input so it's a good candidate for pick().

    Here's an example with the iris dataset using the deprecated cur_data():

    iris |>
        group_by(Species) |>
        reframe(
            mean_petal_length = cur_data() |>
                slice_max(Petal.Length, n = 3) |>
                pull(Petal.Length) |>
                mean(),
            normalized_petal_length = Petal.Length / mean_petal_length
        )
    

    As slice_max() takes a data frame, we provide it one with cur_data(). But then we throw that all away because we're only interested in the Petal.Length column, which we pull(). With pick(), we still need to pull(), but at least we can just pass slice_max() a data frame with one column, the one we're interested in (and the grouping column which comes automatically):

    iris |>
        group_by(Species) |>
        reframe(
            mean_petal_length = pick(Petal.Length) |>
                slice_max(Petal.Length, n = 3) |>
                pull(Petal.Length) |>
                mean(),
            normalized_petal_length = Petal.Length / mean_petal_length
        )
    
    
    # # A tibble: 150 × 3
    #    Species mean_petal_length normalized_petal_length
    #    <fct>               <dbl>                   <dbl>
    #  1 setosa               1.77                   0.792
    #  2 setosa               1.77                   0.792
    #  3 setosa               1.77                   0.736
    #  4 setosa               1.77                   0.849
    #  5 setosa               1.77                   0.792
    #  6 setosa               1.77                   0.962
    #  7 setosa               1.77                   0.792
    #  8 setosa               1.77                   0.849
    #  9 setosa               1.77                   0.792
    # 10 setosa               1.77                   0.849
    

    The output is the same in both cases.