Search code examples
rdata-extraction

Multiple columns returned when I extract a column in a data frame without comma


I'm doing an R exercise on data extraction from a data frame. The data is as follows:

team_name <- c("Bulls", "Warriors")
wins <- c(72, 73)
losses <- c(10, 9)
is_champion <- c(TRUE, FALSE)
season <- c("1995-96", "2015-16")
great_nba_teams <- data.frame(team_name, wins, losses, is_champion, season)

There is no problem in extracting a row and I understand the need of a comma after the vector name in the code:

filter <- great_nba_teams$is_champion == TRUE
great_nba_teams[filter,]
  team_name wins losses is_champion  season
1     Bulls   72     10        TRUE 1995-96

However, when I tried not using a comma, I can't extract the is_champion column. Instead, other columns are returned.

> great_nba_teams[filter]
  team_name losses  season
1     Bulls     10 1995-96
2  Warriors      9 2015-16

That is the same as great_nba_teams[,filter]. Can I know what it means by [filter] and why it is the same as [,filter]? And why the code does not return the data of is_champion?

Thank you very much.


Solution

  • A data frame is a list of columns (which are usually vectors, and must all be of the same length). So when you use

    great_nba_teams[filter]

    it returns the list elements (i.e. columns) where filter is TRUE. This is not correct since filter is intended to be applied to rows, not columns. Filter is actually c(TRUE, FALSE), only 2 elements long, so it gets recycled to length 5, i.e. c(TRUE, FALSE, TRUE, FALSE, TRUE) which is why you get the odd numbered columns.

    great_nba_teams[,filter]

    returns all the rows for the columns where filter is TRUE. Also not what is intended by filter.

    great_nba_teams[filter,]

    returns only the rows where filter is TRUE, but all the columns.

    PS: Don't use 'filter' as a variable name, since it is a common function name. I usually use 'i' for this kind of filter.