Search code examples
rcurllarge-files

R curl upload breakes down for large files because of showing progress


I am using curl::curl_upload to upload large csv files to a ftp server. The function breaks down with the following error message:

Uploaded 2147418112 bytes...Error in sprintf("\rUploaded %d bytes...", total_bytes) : 
  invalid format '%d'; use format %f, %e, %g or %a for numeric objects

I assume the reason is that the sprintf function uses integers? 2147418112 seems suspiciously close to 2^31-1. Actually, 2^31-2147418112 is exactly 2^16. Because it updates the progress in steps of 2^16?

I can avoid this problem with verbose = F.

Is there a way to still get progress? (How) can I overwrite the progress function? I would use View(curl_upload) and copy the code, replace the progress with the following and use my own curl_upload2. Is there an alternative/better way to do this?

cat(sprintf("\rUploaded %d bytes...", total_bytes) # curl
cat(sprintf("\rUploaded %.0f bytes ...", total_bytes) # modified

Solution

  • The problem should be fixed with a decent version of curl, but as the question was how to overwrite the progress function without doing a copy & paste here's a solution using trace:

    1. First you need to find out where the cat sits in curl_upload. You can use as.list(body(curl_upload)) to get the proper indices. You may need to dive deeper into the nested structure in case the cat sits in a nested structure (like here), but some try and error should reveal that:

      as.list(body(curl_upload))[[6]][[3]][[4]]           ## the inner function
      as.list(body(curl_upload))[[6]][[3]][[4]][[3]][[4]] ## line before the verbose `if`
      

      gives you the correct positions.

    2. Now you can inject code which:

      1. Sets verbose locally to FALSE avoiding the original verbose line.
      2. Adds your own updated cat statement (I simply added the string [UPDATE] to show that indeed my cat statement is called and not the original one:
      library(curl)
      untrace(curl_upload)
      trace(curl_upload, quote({
               verbose <- FALSE
               if (length(buf) == 0 || identical(total_bytes, infilesize)) {
                   cat(sprintf("\r[UPDATE:] Uploaded %.0f bytes... all done!\n", 
                       total_bytes), file = stderr())
               } else {
                   cat(sprintf("\r[UPDATE:] Uploaded %.0f bytes...", total_bytes), 
                       file = stderr())
               }
            }), at = list(c(6, 3, 4, 3, 4)), print = FALSE)
      

    Et voilà.

    If you now try out curl_upload you will see that

    1. The original verbose messages are skipped.
    2. Your own messages are printed.
    fn <- tempfile(fileext = ".txt")
    cat("New Line\n\n", file = fn)
    res <- curl_upload(fn, "https://httpbin.org/put")
    # [UPDATE:] Uploaded 12 bytes... all done!