Search code examples
rdataframedplyr

Calculating years between occurence of 1 in a column by group


I have applied both code suggestions to my original data set to compare peace1 and peace2.

peace1:

d_muslim <- d_muslim %>% 
   mutate(last_conflict = lag(if_else(conflict == 1, year,NA),default = min(year) - 1),
   .by = country) %>%     
   tidyr::fill(last_conflict, .direction = "down") %>% 
   mutate(peace1 = year - last_conflict - 1)

peace2:

d_muslim <- d_muslim %>%
  mutate(helper = cumsum(lag(conflict, default = 1) == 1),.by = country) %>% 
  mutate(peace2 = year - first(year), .by = c(country, helper)) %>%
  select(-helper)

result:

country year conflict peace1 peace2
Iran 1946 0 -73 0
Iran 1947 0 -72 1
Iran 1948 0 -71 2
Iran 1949 0 -70 3
Iran 1950 0 -69 4
Iran 1951 0 -68 5
Iran 1952 0 -67 6
Iran 1953 0 -66 7
Iran 1954 0 -65 8
Iran 1955 0 -64 9
Iran 1956 0 -63 10
Iran 1957 0 -62 11
Iran 1958 0 -61 12
Iran 1959 0 -60 13
Iran 1960 0 -59 14
Iran 1961 0 -58 15
Iran 1962 0 -57 16
Iran 1963 0 -56 17
Iran 1964 0 -55 18
Iran 1965 0 -54 19
Iran 1966 1 -1 0
Iran 1967 0 0 1
Iran 1968 0 1 2
Iran 1969 0 2 3
Iran 1970 0 3 4
Iran 1971 0 4 5
Iran 1972 0 5 6
Iran 1973 0 6 7
Iran 1974 0 7 8
Iran 1975 0 8 9
Iran 1976 0 9 10
Iran 1977 0 10 11
Iran 1978 0 11 12
Iran 1979 1 -1 0
Iran 1980 0 0 1
Iran 1981 0 1 2
Iran 1982 0 2 3
Iran 1983 0 3 4
Iran 1984 0 4 5
Iran 1985 0 5 6
Iran 1986 1 -1 0
Iran 1987 0 0 1
Iran 1988 0 1 2
Iran 1989 0 2 3
Iran 1990 1 -1 0
Iran 1991 1 -1 0
Iran 1992 0 0 1
Iran 1993 1 -1 0
Iran 1994 0 0 1
Iran 1995 0 1 2
Iran 1996 1 -1 0
Iran 1997 1 -1 0
Iran 1998 0 0 1
Iran 1999 1 -1 0
Iran 2000 0 0 1
Iran 2001 0 1 2
Iran 2002 0 2 3
Iran 2003 0 3 4
Iran 2004 0 4 5
Iran 2005 1 -1 0
Iran 2006 0 0 1
Iran 2007 0 1 2
Iran 2008 0 2 3
Iran 2009 0 3 4
Iran 2010 0 4 5
Iran 2011 0 5 6
Iran 2012 0 6 7
Iran 2013 0 7 8
Iran 2014 0 8 9
Iran 2015 0 9 10
Iran 2016 1 -1 0
Iran 2017 1 -1 0
Iran 2018 1 -1 0

Both codes do not fulfill my conditions.

  1. the cumulation of the years of peace must extend into the year of conflict. And not be set to 0.
  2. the year after a year of conflict must start with peace == 0. As the function of peace1 does correctly.

Solution

  • I think your 15 should be 18, but otherwise ...

    Using your "expected" as a starting point (for side-by-side comparison):

    quux |>
      mutate(
        last_conflict = lag(if_else(conflict == 1, year, NA), default = min(year) - 1),
        .by = country) |>
      tidyr::fill(last_conflict, .direction = "down") |>
      mutate(peace2 = year - last_conflict - 1)
    #     country year conflict peace last_conflict peace2
    # 1  country1 1990        1     0          1989      0
    # 2  country1 1991        0     0          1990      0
    # 3  country1 1992        0     1          1990      1
    # 4  country1 1993        0     2          1990      2
    # 5  country1 1994        1     3          1990      3
    # 6  country1 1995        1     0          1994      0
    # 7  country1 1996        0     0          1995      0
    # 8  country1 1997        0     1          1995      1
    # 9  country1 1998        0     2          1995      2
    # 10 country1 1999        0     3          1995      3
    # 11 country1 2014        0    15          1995     18
    # 12 country2 1990        0     0          1989      0
    # 13 country2 1991        1     1          1989      1
    # 14 country2 1992        0     0          1991      0
    # 15 country2 1995        1     3          1991      3
    # 16 country2 1996        0     0          1995      0
    # 17 country2 2000        1     4          1995      4
    

    Data

    quux <- structure(list(country = c("country1", "country1", "country1", "country1", "country1", "country1", "country1", "country1", "country1", "country1", "country1", "country2", "country2", "country2", "country2", "country2", "country2"), year = c(1990L, 1991L, 1992L, 1993L, 1994L, 1995L, 1996L, 1997L, 1998L, 1999L, 2014L, 1990L, 1991L, 1992L, 1995L, 1996L, 2000L), conflict = c(1L, 0L, 0L, 0L, 1L, 1L, 0L, 0L, 0L, 0L, 0L, 0L, 1L, 0L, 1L, 0L, 1L), peace = c(0L, 0L, 1L, 2L, 3L, 0L, 0L, 1L, 2L, 3L, 15L, 0L,  1L, 0L, 3L, 0L, 4L)), class = "data.frame", row.names = c(NA, -17L))