Search code examples
web-servicesdynamichtml-parsingscreen-scrapingfeed

How can I programmatically get the hot topic of the day/hour/moment?


I would like to grab the current hot topic when my page loads (it could be anything from a civil war in Syria to a sports team or a wardrobe malfunction). I would like it to be a simple web service call such as:

string hotTopic = getHotTopic();

..but that probably "ain't gonna happen."

So what can I realistically expect? In brainstorming this, I thought of grabbing the headlines on the New York Times, the Huffington Post and a couple of other sites, and then parsing the h1 tags in the html to look for non-common words that appear multiple times. Am I on the right track? Is there a known solution to this challenge?


Solution

  • One can always pull down the RSS feeds from a website and parse those out - however not every website is going to provide a "View-Count" for the articles you're pulling down (making it hard to determine whether or not it is a hot topic).

    I personally would go to Twitter for trending topics - often times the trending words or hashtags coincide with what's really trending in the news. Events like the Superbowl or a weather catastrophe are often showing there.

    To achieve your one method solution, you'll likely need to write a wrapper. If you're using the Twitter API there are some pre-made libraries you could use that help achieve this. The wrapper would be something like:

    (Completely made up code)

    string GetHotTopic() 
    {
    
    var svc = new TwitterSvcWrapper();
    var topics = svc.GetTrending("united states");
    
    return topic[0].text;
    }
    

    I know this doesn't necessarily allow you to parse several pages and find some topics, but perhaps it gives you a method to discover what may be trending. To go against my own idea, Twitter isn't always the best place either. Silly items can be trending that you may not want to use, like "#whatToSayAfter" ...

    I also wanted to add that some websites state it to be against their Terms of Use to "scrape" data. For example, (not that you would use it), but Xbox.com prohibits you from scraping data in their ToS. (1.12)

    Just some ideas - good luck! Cheers!