I'm having issues running a R script that scrapes posts from Reddit as a cron job. The script works flawlessly when manually sourced from within R. Other R scripts also run fine from the crontab. Also, the R scraping package is specifically built to not overrun the Reddit API.
crontab:
*/25 * * * * /usr/bin/Rscript "home/ubuntu/cryptoAPI/NLP/NLPupdater.R"
R script:
setwd("/home/ubuntu/cryptoAPI/NLP")
library(RedditExtractoR)#install.packages('RedditExtractoR')
library('httr')
library(data.table)
#1. check for most recent reddit urls
ss.new.Reddit <- fileSnapshot(path="/home/ubuntu/cryptoAPI/NLP/raw", file.info = F)
Reddit.num <- nrow(ss.new.Reddit[[1]])-1
#2. load CCtop100 snapshot from cronjob
Reddit.urls <- rjson::fromJSON(file=paste0('raw/hot.json@limit=50.', Reddit.num))#"raw/hot.json@limit=100.", Reddit.num))
#3. extract urls from list
urlvector <- character(50) #to 100
for(i in 1:52){ #also to 102
urlvector[i] <- Reddit.urls$`data`$children[[i]]$data$permalink }
#4. combine w formatting for reddict extractor # add 'http://www.reddit.com'
urlvector.long <- paste0('http://www.reddit.com', urlvector)
#5. run redditextractor
Reddit.comments <- reddit_content(urlvector.long)
#6. save new csv
NLPcsv <- paste0("CSV/reddit-nlp-",Reddit.num,".csv" )
fwrite(Reddit.comments, file=NLPcsv)
fwrite(Reddit.comments, file='current/currentNLP.csv')
Is there some limit on how long a cronjob can take? The scrape takes maybe 3mins to complete.
There was a typo in the cronjob, forgot a / in the path:
*/25 * * * * /usr/bin/Rscript "/home/ubuntu/cryptoAPI/NLP/NLPupdater.R"