As you know, read.table
in R
is a very useful but slow function, particularly when it comes to read big databases. In order to face problems related with that function, there exists functions such as read_table
and fread
from readr
and data.table
packages. Unfortunately, their arguments differ from read.table
which made me difficult to replicate this example:
download.file("https://datasets.imdbws.com/title.basics.tsv.gz", "mov_title")
download.file("https://datasets.imdbws.com/title.ratings.tsv.gz", "mov_rating")
title <- read.table("mov_title", sep="\t", header=TRUE,
fill=TRUE, na.strings="\\N", quote="")
rating <- read.table("mov_rating", sep="\t", header=TRUE,
fill=TRUE, na.strings="\\N", quote="")
Basically I want to use fread
or read_table
(or both if it's possible) to create my "title" and "rating" databases. Any advice or reference will be much appreciated.
this seems to work just fine... data.table::fread()
can handle gz-files.
Set \t
(=tab) as separator.
Since some movie-titles contain quotes, set quotes to nothing; quote = ""
. (or not, and just accept the warnings).
library( data.table )
title <- fread( "https://datasets.imdbws.com/title.basics.tsv.gz",
sep = "\t", quote = "" )
rating <- fread( "https://datasets.imdbws.com/title.ratings.tsv.gz",
sep = "\t", quote = "" )