Search code examples
rjupyter-irkernel

Filter records from table in R


I have a dataset Movies.

head(Movies)

output of head(Movies)

How to fetch rows where MovieID is "0000008"? I have tried:

t1 = subset(Movies, "MovieID" == "0000008")
t2 <- Movies[ which(Movies["MovieID"]=="0000008"), ]
head(t1)
head(t2)

Both return empty datasets, which is wrong as I can see a row with ID "0000008".

Edit: I have tried removing "" from MovieID, but that throws error:

Error in subset.matrix(Movies, MovieID == "0000008"): object 'MovieID' not found

Edit: The Movie data was obtained as:

URL = "https://raw.githubusercontent.com/sidooms/MovieTweetings/master/latest/movies.dat"
MovieText = readLines( remote.file(URL) ) # HACK!!!
Movies = matrix( sapply( MovieText,
            function(x) unlist(strsplit(sub(" [(]([0-9]+)[)]", "::\\1",x),"::"))[1:4] ),
            nrow=length(MovieText), ncol=4, byrow=TRUE )
colnames(Movies) = c("MovieID", "MovieTitle", "Year", "Genres")

Solution

  • your nrow should be length(MovieText)/4

    URL = "https://raw.githubusercontent.com/sidooms/MovieTweetings/master/latest/movies.dat"
    MovieText = readLines( URL ) # HACK!!!
    Movies = matrix( sapply( MovieText,
        function(x) unlist(strsplit(sub(" [(]([0-9]+)[)]", "::\\1",x),"::"))[1:4] ),
        nrow=length(MovieText)/4, ncol=4, byrow=TRUE )
    colnames(Movies) = c("MovieID", "MovieTitle", "Year", "Genres")
    
    #if you want to work with matrix, then use this
    subset(Movies, Movies[,"MovieID"]=="0000008")
    

    Edit: data.frame and data.table subsetting

    library(data.table)
    
    MoviesDF <- data.frame(Movies)
    MoviesDT <- data.table(Movies)
    
    MoviesDF[MoviesDF["MovieID"] == "0000008", ]
    MoviesDT[MovieID == "0000008", ]
    

    BTW: Love the HACK!!! comment.