Search code examples
rdataframegrepl

How to remove entries in a data drame that are dissimilar


I have many rosters of people with statistics that I made into a data.frame teamroster. Only problem is, some people have duplicate names, and don't belong on a roster (because they have a different team name. Look at case Matt Duffy in teamroster below). I want to systematically remove all names and entries that do not have the same team name on the roster.

Here is my raw data.frame:

teamroster

      Name      Team   G  PA
1  Denard Span Giants 30 135
2    Joe Panik Giants 25 107
3   Matt Duffy Giants 31 127
4   Matt Duffy Astros  3   3
5 Buster Posey Giants 27 108

The solution code will recognize that Matt Duffy is on a different team, as seen by the Team column, and remove him because he is on Team = Astros. This is what I want the resulting data frame to look like:

finishedteamroster

      Name      Team   G  PA
1  Denard Span Giants 30 135
2    Joe Panik Giants 25 107
3   Matt Duffy Giants 31 127
4 Buster Posey Giants 27 108

Solution

  • You could tabulate the team names then take the maximum of the tabulation. Note that I used which.max() for its side-effect of keeping the table names.

    idx <- with(df, Team == names(which.max(table(Team))))
    df[idx, ]
    #           Name   Team  G  PA
    # 1  Denard Span Giants 30 135
    # 2    Joe Panik Giants 25 107
    # 3   Matt Duffy Giants 31 127
    # 5 Buster Posey Giants 27 108
    

    Data:

    df <- structure(list(Name = structure(c(2L, 3L, 4L, 4L, 1L), .Label = c("Buster Posey", 
    "Denard Span", "Joe Panik", "Matt Duffy"), class = "factor"), 
        Team = structure(c(2L, 2L, 2L, 1L, 2L), .Label = c("Astros", 
        "Giants"), class = "factor"), G = c(30L, 25L, 31L, 3L, 27L
        ), PA = c(135L, 107L, 127L, 3L, 108L)), .Names = c("Name", 
    "Team", "G", "PA"), class = "data.frame", row.names = c("1", 
    "2", "3", "4", "5"))