Search code examples
rstring-search

Searching for multiple strings in dataframe, bottom up in if conditional?


I have a dataframe with multiple columns lets call it dat1, and I need to extract strings in certain columns based on a lot of conditions. There are arunt 350 rows per subject, and I need to find the last of a certain string in a certain column, per subject, containing a certain name, e.g. "green", and take the value from another string in the same row.

subject   type    value
111       yellow   354
111       green    584
111       yellow   584
111       blue     492
111       green    493
222       blue     459
222       green    583
222       yellow   539
222       blue     392
222       yellow   394

Say I need the last value of "green" for each subject, how can I do this?

After doing that, I also need to get the value that is above the last "green", preferably saving them in two separate vectors.

last_green    before_last_green
   493          492
   583          459

I have tried to make multiple if-statements while also grouping by subject, but without even getting near succeeding. I also have not been able to find any examples of this before.

Is it possible to search for the first "green" from the bottom of each subject? Or how else can this be performed?


Solution

  • Answering both questions:

    df=data.frame("subject"=c(rep(111,5),rep(222,5)),
               "type"=c("yellow","green","yellow","blue","green",
                        "blue","green","yellow","blue","yellow"),
               value=c(354,584,584,492,493,459,583,539,392,394),
               stringsAsFactors = FALSE)
       subject   type value
    1      111 yellow   354
    2      111  green   584
    3      111 yellow   584
    4      111   blue   492
    5      111  green   493
    6      222   blue   459
    7      222  green   583
    8      222 yellow   539
    9      222   blue   392
    10     222 yellow   394
    

    Create filter df:

    filter_=df%>%rownames_to_column()%>%group_by(subject)%>%
      filter(type=="green")%>%slice(n())
    
    last_green=filter_$value
    before_last_green=df$value[as.numeric(filter_$rowname)-1]
    

    Results:

    > last_green
    [1] 493 583
    > before_last_green
    [1] 492 459
    

    Explanation:
    We first create rownames to keep track of the position of the last green appearance for each subject. Then, we group by subject and filter by type we take the last element per group with slice(). We then return the value in last_green and the value of the row with the position of these previous samples - 1 (the previous).