Search code examples
rdataframecalculated-columnssurvival

How do I identifying the first zero in a group of ordered columns?


I'm trying to format a dataset for use in some survival analysis models. Each row is a school, and the time-varying columns are the total number of students enrolled in the school that year. Say the data frame looks like this (there are time invariate columns as well).

Name   total.89   total.90   total.91   total.92 
a         8          6         4           0
b         1          2         4           9
c         7          9         0           0
d         2          0         0           0

I'd like to create a new column indicating when the school "died," i.e., the first column in which a zero appears. Ultimately I'd like to have this column be "years since 1989" and can re-name columns accordingly.

A more general version of the question, for a series of time ordered columns, how do I identify the first column in which a given value occurs?


Solution

  • Here's a base R approach to get a column with the first zero (x = 0) or NA if there isn't one:

    data$died <- apply(data[, -1], 1, match, x = 0)
    data
    #   Name total.89 total.90 total.91 total.92 died
    # 1    a        8        6        4        0    4
    # 2    b        1        2        4        9   NA
    # 3    c        7        9        0        0    3
    # 4    d        2        0        0        0    2