Search code examples
pythontwitterbeautifulsoup

How to get the date of a Twitter post (tweet) using BeautifulSoup?


I am trying to extract posting dates from Twitter. I've already succeeded in extracting the name and text of the post, but the date is a hard rock for me.

As input I have a list of links like these:

I am using searching by class, but I think this is a problem. Sometimes it works with some links, sometimes not. I've already tried these solutions:

soup.find("span",class_="_timestamp js-short-timestamp js-relative-timestamp")
soup.find('a', {'class': 'tweet-timestamp'})
soup.select("a.tweet-timestamp")

But none of these works every time.

My current code is:

data = requests.get(url)                    
soup = BeautifulSoup(data.text, 'html.parser')
gdata = soup.find_all("script")    
for item in gdata:
items2 = item.find('a', {'class': 'tweet-timestamp js-permalink js-nav js-tooltip'}, href=True)                            
if items2:
items21 = items2.get('href')
items22 = items2.get('title')
print(items21)
print(items22)

I need to have output with the posting date.


Solution

  • I believe twitter API would be best choice but regaring your code....

    It's available via title attribute of element with class tweet-timestamp. This element is not within a script tag which seems to be where you are searching:

    gdata = soup.find_all("script")    
    for item in gdata:
        items2 = item.find('a', {'class': 'tweet-timestamp js-permalink js-nav js-tooltip'}, href=True)   
    

    Instead, select by the class direct:

    data = requests.get(link)                    
    soup = BeautifulSoup(data.text, 'html.parser')
    tweets = soup.find_all('div' , {'class': 'content'})    
    for item in tweets:
        items2 = item.find('a', {'class': 'tweet-timestamp js-permalink js-nav js-tooltip'}, href=True)                            
        if items2:
            items21 = items2.get('href')
            items22 = items2.get('title')
            print(items21)
            print(items22.split('-')[1].strip())
    

    I prefer css selectors and you only need one class out of the compound classes:

    data = requests.get(link)                    
    soup = BeautifulSoup(data.text, 'html.parser')
    tweets = soup.select(".content")    
    for item in tweets:
        items2 = item.select_one('.tweet-timestamp')                            
        if items2:
            items21 = items2.get('href')
            items22 = items2.get('title')
            print(items21)
            print(items22.split('-')[1].strip())