I'm having trouble loading my csv from Dropbox to RStudio Cloud. Can you please check what I'm doing wrong?

I am trying to import a CSV from Dropbox into RStudio Cloud and it seems that RStudio is not reading the file correctly. The dataset has 7 columns, 5 million+ rows of toll collection data but the RStudio seems to be pulling something completely different.

I downloaded the CSV from https://data.ny.gov/Transportation/NYS-Thruway-Origin-and-Destination-Points-for-All-/chzq-388p onto my computer and then uploaded the file to Dropbox. The Dropbox file link for the file is https://www.dropbox.com/s/y76m7r7wuzlj7oo/NYS_Thruway_Origin_and_Destination_Points_for_All_Vehicles_-_1_Hour_Intervals__2019%20%283%29.csv?dl=0.

I have used both rempis and read.csv to load the file but RStudio is not reading the data set. Instead of the data set, RStudio Cloud is instead reading a file with two columns (row.names and X.Docutype).

Using read.csv:

ThruwayTrafficData <- read.csv(file = "https://www.dropbox.com/s/y76m7r7wuzlj7oo/NYS_Thruway_Origin_and_Destination_Points_for_All_Vehicles_-_1_Hour_Intervals__2019%20%283%29.csv?dl=0", header = TRUE)

Using rempis:

install.packages("rempis")
library(rempis)
ThruwayTrafficDataURL <- "https://www.dropbox.com/s/y76m7r7wuzlj7oo/NYS_Thruway_Origin_and_Destination_Points_for_All_Vehicles_-_1_Hour_Intervals__2019%20%283%29.csv?dl="
ThruwayTrafficData <- repmis::source_data(ThruwayTrafficDataURL, header = TRUE)

When I download the file and then load it onto RStudio Desktop using my file path, it loads perfectly fine.

I do however want to use RStudio Cloud. I need to use DropBox and cannot upload the file onto RStudio Cloud because the file size is about 700 MB.

Solution

You need to find a better link. You can troubleshoot by using readLines("https://...") instead of read.csv (beware, don't dump the whole thing to the console, it's annoyingly big). The character vector is 211 long (for me), and the first few lines start with:

<!DOCTYPE html><html xml:lang="en" class="maestro" xmlns="http://www.w3.org/1999/xhtml"><head><script nonce="RH9SaRzfUGdPecKWSel7">
window._goch_ = {};
window.addEventListener('click', function(event) {
    'use strict';
    for (var elm = event.target; elm; elm = elm.parentElement) {
        if (elm.id &&

This suggests your URL is retrieving the HTML behind the page, not the data itself.

Instead, change the end of your URL from dl=0 to dl=1, and the read.csv works fine. I don't know the API behind the variable, but it works for me (albeit slowly ... that's not a small file).

More the point, if you go to the link you provided, it takes you to a page that says "This file is too big to preview", and provides a javascript pull-down for a direct-download. If you start the download, once started you can interrupt it. In FF at least, you can then look at the download URL (right-click on the interrupted download, select "copy download link") and see that it ends with dl=1, though the URL has a few more components to it than what I tried.

Regardless, once I changed 0 to 1 and downloaded it, I got this:

> ThruwayTrafficData <- read.csv(file = "https://www.dropbox.com/s/y76m7r7wuzlj7oo/NYS_Thruway_Origin_and_Destination_Points_for_All_Vehicles_-_1_Hour_Intervals__2019%20%283%29.csv?dl=1", header = TRUE)
> str(ThruwayTrafficData)
 'data.frame':   5670906 obs. of  7 variables:
  $ Date                   : Factor w/ 108 levels "01/01/2019","01/02/2019",..: 108 108 108 108 108 108 108 108 108 108 ...
  $ Entrance               : Factor w/ 52 levels "15","16","17",..: 1 1 1 1 1 1 1 1 1 1 ...
  $ Exit                   : Factor w/ 52 levels "15","16H","17",..: 3 3 3 3 3 3 3 3 3 3 ...
  $ Interval.Beginning.Time: int  0 0 0 0 0 0 0 0 0 0 ...
  $ Vehicle.Class          : Factor w/ 12 levels "2H","2L","3H",..: 1 1 2 2 3 5 7 7 8 9 ...
  $ Vehicle.Count          : int  1 6 33 120 5 1 5 28 6 2 ...
  $ Payment.Type           : Factor w/ 2 levels "CASH","E-ZPass": 1 2 1 2 2 2 1 2 2 2 ...