Search code examples
javascriptnode.jsherokutwitter

How to use Javascript in a Node.js app to get external webpage info/webscraping?


I am using the Twit API for Node.js, and have hosted my code on Heroku, which is where it is currently running from. I followed Daniel Shiffman's tutorials from : http://shiffman.net/a2z/twitter-bots/ and http://shiffman.net/a2z/bot-heroku/

I would like my bot to go to https://en.wikipedia.org/wiki/Special:Random and "get" the title. I would then post the title as a tweet. After some research, it seems that I would like to do something called webscraping. Let's say that the title of the wiki page resides in the title tag in the html file in the head. Does anyone know how I can access the url, and get the info I need? I'm not sure where to start. Search results on stackoverflow led me to outdated answers about using jquery and a yahoo api. A solution in javascript would be helpful, so that I know it is compatabile with heroku


Solution

  • You can use Puppeteer, from Google to do it, look

    Github

    Article