Search code examples
rdplyrtidyverse

how to create a new column where the rows are determined by the previous row (calculation)?


I have this table and I want to calculate inflation for every month in every city

I tried the lag and lead functions but the CPI was overlapping onto the next city for ex : Malmo's shifted CPI was Stockholm's 05-01-2005 CPI

CPI Date City
112 2005-01-01 Stockholm
113.5 2005-02-01 Stockholm
115 2005-03-01 Stockholm
115.6 2005-04-01 Stockholm
115.8 2005-05-01 Stockholm
106 2005-01-01 Malmo
107.5 2005--02-01 Malmo
110 2005-03-01 Malmo
113 2005-04-01 Malmo
117 2005-05-01 Malmo

Solution

  • You need to group_by to avoid this issue.

    # Read in data
    inflation  <- read.table(text = "CPI    Date    City
    112 2005-01-01  Stockholm
    113.5   2005-02-01  Stockholm
    115 2005-03-01  Stockholm
    115.6   2005-04-01  Stockholm
    115.8   2005-05-01  Stockholm
    106 2005-01-01  Malmo
    107.5   2005--02-01 Malmo
    110 2005-03-01  Malmo
    113 2005-04-01  Malmo
    117 2005-05-01  Malmo", h = T)
    
    # Perform calculation
    library(dplyr)
    
    inflation  |>
        group_by(City)  |>
        mutate(
            cpi_change = lead(CPI) - CPI,
            cpi_change_percent = cpi_change / CPI * 100
        )
    

    Output:

    # A tibble: 10 x 5
    # # Groups:   City [2]
    #      CPI Date        City      cpi_change cpi_change_percent
    #    <dbl> <chr>       <chr>          <dbl>              <dbl>
    #  1  112  2005-01-01  Stockholm      1.5                1.34
    #  2  114. 2005-02-01  Stockholm      1.5                1.32
    #  3  115  2005-03-01  Stockholm      0.600              0.522
    #  4  116. 2005-04-01  Stockholm      0.200              0.173
    #  5  116. 2005-05-01  Stockholm     NA                 NA
    #  6  106  2005-01-01  Malmo          1.5                1.42
    #  7  108. 2005--02-01 Malmo          2.5                2.33
    #  8  110  2005-03-01  Malmo          3                  2.73
    #  9  113  2005-04-01  Malmo          4                  3.54
    # 10  117  2005-05-01  Malmo         NA                 NA
    

    You will get NAs for the last month as we do not know the rate in the following year. Alternatively you can do it with lag instead of lead if you want to work out change from previous, but then you'll get NAs for the first month.