Search code examples
rdata-analysis

Filter the games that were released in different years in different platforms (R)


I have a csv file with video games info. The columns are

| Rank | Name | Platform | Year | Genre | Publisher | NA_Sales | EU_Sales | JP_Sales | Other_Sales | Global_Sales |

Note: Sales are in millions.

One row would be:

259, Asteroids, 2600, 1980, Shooter, Atari, 4, 0.26, 0, 0.05, 4.31

I am trying to filter for those games that were released in different platforms in different years, for example, Mario Bros was released for DS and Wii in 1996 and 2000.

I have tried to create a function that uses two for loops to try and find games that have the same name, but I don't seem to get it right. I have also tried to group by Name, Year, Platform and I get it wrong too.

I can't get this done and it's really frustrating, any help would be welcomed. Thank you in advance.


Solution

  • dplyr

    library(dplyr)
    dat %>%
      group_by(Name) %>%
      filter(n_distinct(Platform, Year) > 1) %>%
      ungroup()
    

    data.table

    library(data.table)
    as.data.table(dat)[, .SD[uniqueN(interaction(Platform, Year)) > 1,], by = .(Name)]
    

    base R

    ind <- ave(interaction(dat$Platform, dat$Year), dat$Name, FUN = function(z) length(unique(z)) > 1)
    

    If your Platform column is integer, then use

    dat[ind > 0,]
    ## or
    dat[ind == 1L,]
    

    If, however, your Platform is character, then you'll need

    dat[ind == "TRUE",]
    

    This is because stats::ave's return value is always the same class as its first argument, dat$Platform here. Even if the inner FUNction produces logical or something else, it is always coerced. (Since ave uses `split<-` which reassigns the updated x back into the original vector, the coercing acts by default, not necessarily by-design.)

    Edited to include Year in the determination.