I wanted Twitter tweets of user for data analysis. For that I have used HtmlAgilityPack package to scrape Twitter and it gives me 30 top tweets.
I recognized tweet-text element and fetched all tweets. But I want to identify if it is tweet or retweet. How can I do that?
I have analysed HTML. In retweet there will be an element having tweet-context with-icn
class. But when I scrape tweet on that class it throws null exception, because not all tweets will have that class. Then based on what and how can I scrape to get to know if it is retweet or not?
HtmlAgilityPack.HtmlWeb web = new HtmlAgilityPack.HtmlWeb();
HtmlAgilityPack.HtmlDocument doc = web.Load("https://twitter.com/BarackObama");
var TweetsNode= doc.DocumentNode.SelectNodes("//tr[@class='tweet-container']").ToList();
foreach (var item in TweetsNode)
var tweet = new Tweets
In the above code, I have tried to fetch tweets of Barack Obama profile. I'm getting top 30 tweets. How can I recognize which one is retweet?
Thank you.
Get all Tweets from a page (which comes in handy tables <table class='tweet '>
HtmlWeb p = new HtmlWeb();
var doc = p.Load(@"https://twitter.com/dailygametips");
var nodes = doc.DocumentNode.SelectNodes("//table[@class='tweet ']");
Look in nodes for the <span class='context'>
to indicated that this tweet is a retweet.
List<Tweet> tweets = new List<Tweet>();
foreach (var node in nodes)
bool isRetweet = false;
var spanNode = node.SelectSingleNode(".//span[@class='context']");
if (spanNode != null && spanNode.InnerHtml.Contains("retweeted"))
isRetweet = true;
We also want the Message Text, so scrap this next <div class='tweet-text'>
string msg = string.Empty;
var msgNode = node.SelectSingleNode(".//div[@class='tweet-text']");
if (msgNode != null)
msg = msgNode.InnerText.Trim();
tweets.Add(new Tweet(msg, isRetweet));
Additional the Tweet Container Class:
class Tweet
public Tweet(string message, bool isRetweet)
Message = message;
IsRetweet = isRetweet;
string Message { get; private set; }
bool IsRetweet { get; private set; }
As you tell, this is not really rocket science. But you need to understand the basic principals of XPath and Scrapping.