Search code examples
rdplyrtidyversesurvival-analysispanel-data

Split observation (row) into month-wise pieces (rows)


I'm currently working on the civil conflict, and my dataset is the UCDP Armed Conflict Dataset. My focus is on the monthly duration of the civil war. However, I'm having trouble converting the original conflict-year data into conflict-month data.

I'll provide an example of my data below:

conflict_id start_date end_date year termination
100 1946-05-18 NA 1946 0
100 1946-05-18 1947-03-01 1947 1
101 1950-05-01 1950-07-01 1947 1

I am expecting following result :

conflict_id year month duration termination
100 1946 5 1 0
100 1946 6 2 0
100 1946 7 3 0
... ... ... ...
100 1947 2 9 0
100 1947 3 10 1

Any suggestions, examples would be greatly appreciated. Thank you in advance for your time and expertise!


Solution

  • one approach (rather long 'tidy-style' pipeline so you might want to break it up to inspect which does which):

    library(tidyr)
    library(lubridate)
    library(zoo)
    
    df |> ## df is a dataframe of the example data you provided
      mutate(across(ends_with('_date'),
                    ~ as.Date(.x) |> as.yearmon()
                    )
             ) |>
      group_by(conflict_id) |>
      summarize(start = min(start_date, na.rm = TRUE),
                end = max(end_date, na.rm = TRUE)
                ) |>
      rowwise() |>
      mutate(ym = seq(start, end, 1/12) |> list()) |>
      unnest_longer(ym) |>
      select(conflict_id, ym) |>
      group_by(conflict_id) |>
      mutate(year = year(ym),
             month = month(ym),
             duration = row_number(),
             termination = ifelse(duration < max(duration), 0, 1)
             )
    
    + # A tibble: 14 x 6
    # Groups:   conflict_id [2]
       conflict_id ym         year month duration termination
             <int> <yearmon> <dbl> <dbl>    <int>       <dbl>
     1         100 Mai 1946   1946     5        1           0
     2         100 Jun 1946   1946     6        2           0
    ## ... lines removed
    11         100 Mär 1947   1947     3       11           1
    12         101 Mai 1950   1950     5        1           0
    13         101 Jun 1950   1950     6        2           0
    14         101 Jul 1950   1950     7        3           1
    >