Search code examples
rdownloadquanteda

Download multiple txt files R


I want to download a number of .txt-files. I have a data frame'"New_test in which the urls are under 'url' and the dest. names under 'code

"New_test.txt"

"url"   "code"
"1" "http://documents.worldbank.org/curated/en/704931468739539459/text/multi-page.txt" "704931468739539459.txt"
"2" "http://documents.worldbank.org/curated/en/239491468743788559/text/multi-page.txt"  "239491468743788559.txt"
"3" "http://documents.worldbank.org/curated/en/489381468771867920/text/multi-page.txt"  "489381468771867920.txt"
"4" "http://documents.worldbank.org/curated/en/663271468778456388/text/multi-page.txt"  "663271468778456388.txt"
"5" "http://documents.worldbank.org/curated/en/330661468742793711/text/multi-page.txt"  "330661468742793711.txt"
"6" "http://documents.worldbank.org/curated/en/120441468766519490/text/multi-page.txt"  "120441468766519490.txt"
"7" "http://documents.worldbank.org/curated/en/901481468770727038/text/multi-page.txt"  "901481468770727038.txt"
"8" "http://documents.worldbank.org/curated/en/172351468740162422/text/multi-page.txt"  "172351468740162422.txt"
"9" "http://documents.worldbank.org/curated/en/980401468740176249/text/multi-page.txt"  "980401468740176249.txt"
"10" "http://documents.worldbank.org/curated/en/166921468759906515/text/multi-page.txt" "166921468759906515.txt"
"11" "http://documents.worldbank.org/curated/en/681071468781809792/text/DRD169.txt" "681071468781809792.txt"
"12" "http://documents.worldbank.org/curated/en/358291468739333041/text/multi-page.txt" "358291468739333041.txt"
"13" "http://documents.worldbank.org/curated/en/716041468759870921/text/multi0page.txt" "716041468759870921.txt"
"14" "http://documents.worldbank.org/curated/en/961101468763752879/text/34896.txt"  "961101468763752879.txt"`

this is the script

rm(list=ls())

require(quanteda)
library(stringr)

workingdir <-setwd("~/Study/Master/Thesis/Mining/R/WorldBankDownl") 
test <- read.csv(paste0(workingdir,"/New_test.txt"), header = TRUE, 
stringsAsFactors = FALSE, sep="\t")

#Loop through every url in test_df and download in target directory with name = code
 for (url in test) {
 print(head(url))
 print(head(test$code))
 destfile <- paste0('~/Study/Master/Thesis/Mining/R/WorldBankDownl/Sources/', test$code)
 download.file(test$url, destfile,  method = "wget", quiet=TRUE)

And this is the error I get

Error in download.file(test$url, destfile, method = "wget", quiet = TRUE) : 
'url' must be a length-one character vector

Solution

  • Everyone, thank you for helping me. For me the solution was changing the method I used. 'wget' demanded 1 url in 'url' and the same for 'destfile'. 'A length-one character vector' Both 'url' and 'destfile' are length-fourteen character vectors. Now I use the method 'libcurl', which demands that the length of the character vectors is equal in 'url' and 'destfile'. If you use this method, make sure that quiet = TRUE.

    Furthermore, it is possible that you have a working loop but that you get the error.

    Error in download.file(test$url, destfile, method = "libcurl", quiet = TRUE) : 
    cannot download any files
    In addition: There were 50 or more warnings (use warnings() to see the first 50) 
    

    This means that your source can't keep up with your calls, you're basically DDOS'ing, so the loop has to be slowed down.

    rm(list=ls())
    
    require(quanteda)
    library(stringr)
    
    workingdir <-setwd("~/Study/Master/Thesis/Mining/R/WorldBankDownl") 
    test <- read.csv(paste0(workingdir,"/New_test.txt"), header = TRUE, 
    stringsAsFactors = FALSE, sep="\t")
    
    test <- data.frame(test)
    
    
    #Loop through every url in comb_df and download in target directory with name = code, if you get an error and no files are downloaded try to slow down the loop.
    for (url in test) {
     print(head(url))
     destfile <- paste0(workingdir, '/Sources/WB_', test$code)
     download.file(test$url, destfile, method = "libcurl", quiet = TRUE)
    }