Search code examples
rdataframedata-manipulationfind-occurrencesrowwise

R - Return column name for row where first given value is found


I am trying to find the first occurrence of a FALSE in a dataframe for each row value. My rows are specific occurrences and the columns are dates. I would like to be able to find the date of first FALSE so that I can use that value to find a return date.

An example structure of my dataframe:

df <- data.frame(ID = c(1,2,3), '2001' = c(TRUE, TRUE, TRUE), 
                 '2002' = c(FALSE, TRUE, FALSE), '2003' = c(TRUE, FALSE, TRUE))

I want to end up with a second dataframe or list that contains the ID and the column name that identifies the first instance of a FALSE.

For example :

ID | Date
1  | 2002
2  | 2003
3  | 2002

I do not know the mechanism to find such a result.

The actual dataframe contains a couple thousand rows so I unfortunately can't do it by hand.

I am a new R user so please don't refrain from suggesting things you might expect a more experienced R user to have already thought about.

Thanks in advance


Solution

  • Try this using tidyverse functions. You can reshape data to long and then filter for F values. If there are some duplicated rows the second filter can avoid them. Here the code:

    library(dplyr)
    library(tidyr)
    #Code
    newdf <- df %>% pivot_longer(-ID) %>%
      group_by(ID) %>%
      filter(value==F) %>%
      filter(!duplicated(value)) %>% select(-value) %>%
      rename(Myname=name)
    

    Output:

    # A tibble: 3 x 2
    # Groups:   ID [3]
         ID Myname 
      <dbl> <chr>
    1     1 2002 
    2     2 2003 
    3     3 2002 
    

    Another option without duplicated values can be using the row_number() to extract the first value (row_number()==1):

    library(dplyr)
    library(tidyr)
    #Code 2
    newdf <- df %>% pivot_longer(-ID) %>%
      group_by(ID) %>%
      filter(value==F) %>%
      mutate(V=ifelse(row_number()==1,1,0)) %>%
      filter(V==1) %>%
      select(-c(value,V)) %>% rename(Myname=name)
    

    Output:

    # A tibble: 3 x 2
    # Groups:   ID [3]
         ID Myname 
      <dbl> <chr>
    1     1 2002 
    2     2 2003 
    3     3 2002 
    

    Or using base R with apply() and a generic function:

    #Code 3
    out <- data.frame(df[,1,drop=F],Res=apply(df[,-1],1,function(x) names(x)[min(which(x==F))]))
    

    Output:

      ID  Res
    1  1 2002
    2  2 2003
    3  3 2002