Search code examples
rdatasetdifference

how to difference values within several panels


Suppose the data is:

data <- structure(list(country = c("Poland", "Poland", "Poland", "Poland", 
                             "Poland", "Poland", "Portugal", "Portugal", "Portugal", "Portugal", 
                             "Portugal", "Portugal", "Spain", "Spain", "Spain", "Spain", "Spain", 
                             "Spain"), Code = c("POL", "POL", "POL", "POL", "POL", "POL", 
                                                "PRT", "PRT", "PRT", "PRT", "PRT", "PRT", "ESP", "ESP", "ESP", 
                                                "ESP", "ESP", "ESP"), year = c(1950, 1951, 1952, 1953, 1954, 
                                                                               1955, 1950, 1951, 1952, 1953, 1954, 1955, 1950, 1951, 1952, 1953, 
                                                                               1954, 1955), IV = c(3, 3, 3, 3, 3, 3, 1, 1, 1, 1, 1, 1, 
                                                                                                          1, 1, 1, 1, 1, 1)), row.names = c(1L, 2L, 3L, 4L, 5L, 6L, 7L, 
                                                                                                                                            8L, 9L, 10L, 11L, 12L, 13L, 14L, 15L, 16L, 17L, 18L), class = "data.frame")

How to difference IV values within each panel (country)? In other words, I want to substract IV values for the year of 1951 from 1950; 1952-1951; 1953-1952; 1954-1953; 1955-1954. And so for every country. In the outcome dataset (let's name it 'newdata'), IV value for each year has to show its difference with an IV value from a previous year; 1950 year should be just empty.

Any suggestions, guys? Hope my question is not confusing.


Solution

  • Using diff in ave perhaps.

    > transform(data, dif=ave(IV, Code, FUN=\(x) c(NA, diff(x))))
        country Code year IV dif
    1    Poland  POL 1950  3  NA
    2    Poland  POL 1951  3   0
    3    Poland  POL 1952  1  -2
    4    Poland  POL 1953  3   2
    5    Poland  POL 1954  3   0
    6    Poland  POL 1955  1  -2
    7  Portugal  PRT 1950  1  NA
    8  Portugal  PRT 1951  1   0
    9  Portugal  PRT 1952  1   0
    10 Portugal  PRT 1953  1   0
    11 Portugal  PRT 1954  1   0
    12 Portugal  PRT 1955  1   0
    13    Spain  ESP 1950  1  NA
    14    Spain  ESP 1951  1   0
    15    Spain  ESP 1952  3   2
    16    Spain  ESP 1953  1  -2
    17    Spain  ESP 1954  3   2
    18    Spain  ESP 1955  1  -2
    

    To eliminate 1950 thereafter, pipe in a subset.

    > transform(data, dif=ave(IV, Code, FUN=\(x) c(NA, diff(x)))) |> 
    +   subset(year != 1950)
        country Code year IV dif
    2    Poland  POL 1951  3   0
    3    Poland  POL 1952  1  -2
    4    Poland  POL 1953  3   2
    5    Poland  POL 1954  3   0
    6    Poland  POL 1955  1  -2
    8  Portugal  PRT 1951  1   0
    9  Portugal  PRT 1952  1   0
    10 Portugal  PRT 1953  1   0
    11 Portugal  PRT 1954  1   0
    12 Portugal  PRT 1955  1   0
    14    Spain  ESP 1951  1   0
    15    Spain  ESP 1952  3   2
    16    Spain  ESP 1953  1  -2
    17    Spain  ESP 1954  3   2
    18    Spain  ESP 1955  1  -2
    

    Data:

    set.seed(42)
    data$IV <- sample(data$IV)  ## to better demonstrate the result