Search code examples
rdplyrdata-manipulation

Dividing second row by first per group


I have data in the following format:

name = c("john", "john", "jack", "jack", "jason", "jason")
time_to_run_100_meters_last_year_this_year = c(22.3, 22.1, 12.4, 12.3, 15.1, 15.6)

my_data = data.frame(name, time_to_run_100_meters_last_year_this_year)


   name time_to_run_100_meters_last_year_this_year
1  john                                       22.3
2  john                                       22.1
3  jack                                       12.4
4  jack                                       12.3
5 jason                                       15.1
6 jason                                       15.6

I want to find out how the percent change in time for each student. This would mean: (22.1/22.3, 12.3/12.4, 15.6/15.1).

I thought of the following way to solve this problem:

library(dplyr)

my_data = my_data %>% 
  arrange(name) %>%
  group_by(name) %>% 
  mutate(id = row_number()) %>%
  ungroup()


id_1 =  my_data[which(my_data$id == 1), ]

id_2 =  my_data[which(my_data$id == 2), ]

division =  id_2$time_to_run_100_meters_last_year_this_year/id_1$time_to_run_100_meters_last_year_this_year

unique = unique(my_data$name)

final_data = data.frame(unique, division)

In the end, I think my idea worked:

> final_data
  unique  division
1   jack 0.9919355
2  jason 1.0331126
3   john 0.9910314

My Question: But are there better ways to solve this problem?


Solution

  • You can use group_by and summarize in the package dplyr.

    Use lead for the value behind the current row and use na.omit to ignore NA in the calculation.

    library(dplyr)
    
    final_data <- 
      my_data %>% 
      group_by(name) %>% 
      summarize(division = na.omit(lead(time_to_run_100_meters_last_year_this_year)/time_to_run_100_meters_last_year_this_year))
    
    final_data
    # A tibble: 3 × 2
      name  division
      <chr>    <dbl>
    1 jack     0.992
    2 jason    1.03 
    3 john     0.991