Search code examples
rfilteringalphabetical

Filter in R based on Alphabetical Order


It seems like I should know how to do this already. But basically I have a table with duplicate values with differences in one column. I searched and found a lot of questions about Sorting by alphabetical order, but not filtering by alphabetical order.

Sorry in advance that I also can't figure out how to format some sample data well.

ResultID -Condition -nVariedSolute -tabscore5 -ItemPartID

644040----LDoF -----2------------------ 2B---------- 540000

644040 ---LDoF -----1 ----------------- 3B ---------- 540000

So, I'm trying to filter based on the max (alphabetically) value of tabscore5. Everything I've found using split() assumes it's a numeric value.

I want to keep the entire row, but only the rows with the max value in tabscore5 for each value of ResultID.

I assume this could be something like

df %>% group_by(ResultID) %>% split(max(c(which.min(tabscore5))))

but I keep getting no data as a response. What am I missing?

below I'm trying to use output from dput(my_df) as user @MikeH suggested, but I'm probably doing it wrong.

    structure(list(ResultID = c(644040L, 644040L, 644043L, 644047L, 644047L, 644050L, 644050L, 644249L, 644251L, 644251L, 644252L, 644252L, 644259L, 644259L), Condition = structure(c(2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L,1L, 1L, 1L, 1L, 1L, 1L), .Label = c("HDoF", "LDoF"), class = "factor"), nVariedSolute = c(-1, 2, 1, 1, 1, 2, 2, 1, 1, 1, 1, 1, 1, 1), tabscore5 = c("3B", "2B", "1", "1", "2A", "2B", "3A", "1", "1", "2A", "1", "2A", "1", "2A"), Question = c("1 - DrinkMix_SometimesClaim",  "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim",  "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim",  "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim",  "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim",  "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim",  "1 - DrinkMix_SometimesClaim", "1 - DrinkMix_SometimesClaim",  "1 - DrinkMix_SometimesClaim"), ItemPartID = c(540000, 540000,  540000, 539941, 539941, 539941, 539941, 540000, 539941, 539941,  539941, 539941, 539941, 539941)), .Names = c("ResultID", "Condition",  "nVariedSolute", "tabscore5", "Question", "ItemPartID"), row.names = c(NA,  -14L), class = "data.frame")

Solution

  •   library(dplyr)
      df %>% 
      group_by(ResultID) %>%
      top_n(n = 1, wt =tabscore5)
    
    #   ResultID Condition nVariedSolute tabscore5                    Question ItemPartID
    #      <int>    <fctr>         <dbl>     <chr>                       <chr>      <dbl>
    # 1   644040      LDoF            -1        3B 1 - DrinkMix_SometimesClaim     540000
    # 2   644043      LDoF             1         1 1 - DrinkMix_SometimesClaim     540000
    # 3   644047      HDoF             1        2A 1 - DrinkMix_SometimesClaim     539941
    # 4   644050      HDoF             2        3A 1 - DrinkMix_SometimesClaim     539941
    # 5   644249      LDoF             1         1 1 - DrinkMix_SometimesClaim     540000
    # 6   644251      HDoF             1        2A 1 - DrinkMix_SometimesClaim     539941
    # 7   644252      HDoF             1        2A 1 - DrinkMix_SometimesClaim     539941
    # 8   644259      HDoF             1        2A 1 - DrinkMix_SometimesClaim     539941