Search code examples
rdataframeloopsvariableslogical-operators

Subset by value in columnA and get the value in a columnC that corresponds to the maximum value row of columnB in R? Automate this using loop


The title is rather confusing (apologies), but I will try to explain it here.

I have a dataset with multiple names of individuals, the number of coffees they have had on a particular day of the week, and the corresponding day of the week in 3 columns

For e.g. enter image description here

What I want as an output, is the 'day' (column 3), when Tom, Mary, and John have the 'maximum' amount of coffee. i.e. Tom - Monday, Mary - Friday, John - Monday as a table.

x=data.frame(read.csv(file.choose()))
tom=subset(x,x$Name=="Tom")
tom.max=max(tom)
tom$Day[which.max(tom$Coffees)]

I could do this for one name, but struggling to use a loop to do it for all the names together, so I wouldn't have to repeat it for every single name in a bigger dataset. I want the output to look something like this

enter image description here

Any help would be appreciated.


Solution

  • This can be achieved easily with the dplyr package. Please note that posting screen-shot images of your data is frowned upon - better to cut-and-paste the actual numbers (as text) or use the dput() function.

    df <- read.table(sep="", header=TRUE, text=c("
    Name   Coffees  Day
    Tom      2      Monday
    Tom      1      Wednesday
    Mary     3      Friday
    Tom      2      Thursday
    John     3      Monday
    Tom      4      Monday
    Mary     5      Tuesday
    Tom      6      Friday
    John     1      Saturday
    "))
    
    df
    #>   Name Coffees       Day
    #> 1  Tom       2    Monday
    #> 2  Tom       1 Wednesday
    #> 3 Mary       3    Friday
    #> 4  Tom       2  Thursday
    #> 5 John       3    Monday
    #> 6  Tom       4    Monday
    #> 7 Mary       5   Tuesday
    #> 8  Tom       6    Friday
    #> 9 John       1  Saturday
    
    suppressPackageStartupMessages(library(tidyverse))
    
    df %>% 
      group_by(Name) %>% 
      arrange(desc(Coffees), .by_group=TRUE) %>% 
      slice_head(n=1)
    #> # A tibble: 3 × 3
    #> # Groups:   Name [3]
    #>   Name  Coffees Day    
    #>   <chr>   <int> <chr>  
    #> 1 John        3 Monday 
    #> 2 Mary        5 Tuesday
    #> 3 Tom         6 Friday
    

    Created on 2024-05-15 with reprex v2.1.0