Here is my code:
imdb <- read.table(gzfile("/imdb_dataset/title.basics.tsv.gz"), sep = " ")
The error:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 1 did not have 9 elements
The file, where you can see how the columns is separated etc.
In fact, the first line has 9 elements, so what could be the issue?
tt0000010 short Exiting the Factory La sortie de l'usine Lumière à Lyon 0 1895 \N 1 Documentary,Short
tt0000011 short Akrobatisches Potpourri Akrobatisches Potpourri 0 1895 \N 1 Documentary,Short
tt0000012 short The Arrival of a Train L'arrivée d'un train à La Ciotat 0 1896 \N 1 Action,Documentary,Short
I see 2 potential problems with your import:
" "
) instead of a tab ("\t"
) as the delimiter but you say it's a tsv\N
characters that could throw it off - try replacing those