I'm doing an R exercise on data extraction from a data frame. The data is as follows:
team_name <- c("Bulls", "Warriors")
wins <- c(72, 73)
losses <- c(10, 9)
is_champion <- c(TRUE, FALSE)
season <- c("1995-96", "2015-16")
great_nba_teams <- data.frame(team_name, wins, losses, is_champion, season)
There is no problem in extracting a row and I understand the need of a comma after the vector name in the code:
filter <- great_nba_teams$is_champion == TRUE
great_nba_teams[filter,]
team_name wins losses is_champion season
1 Bulls 72 10 TRUE 1995-96
However, when I tried not using a comma, I can't extract the is_champion
column. Instead, other columns are returned.
> great_nba_teams[filter]
team_name losses season
1 Bulls 10 1995-96
2 Warriors 9 2015-16
That is the same as great_nba_teams[,filter]
. Can I know what it means by [filter]
and why it is the same as [,filter]
? And why the code does not return the data of is_champion
?
Thank you very much.
A data frame is a list of columns (which are usually vectors, and must all be of the same length). So when you use
great_nba_teams[filter]
it returns the list elements (i.e. columns) where filter is TRUE. This is not correct since filter is intended to be applied to rows, not columns. Filter is actually c(TRUE, FALSE), only 2 elements long, so it gets recycled to length 5, i.e. c(TRUE, FALSE, TRUE, FALSE, TRUE) which is why you get the odd numbered columns.
great_nba_teams[,filter]
returns all the rows for the columns where filter is TRUE. Also not what is intended by filter.
great_nba_teams[filter,]
returns only the rows where filter is TRUE, but all the columns.
PS: Don't use 'filter' as a variable name, since it is a common function name. I usually use 'i' for this kind of filter.