I am trying to extract posting dates from Twitter. I've already succeeded in extracting the name and text of the post, but the date is a hard rock for me.
As input I have a list of links like these:
I am using searching by class, but I think this is a problem. Sometimes it works with some links, sometimes not. I've already tried these solutions:
soup.find("span",class_="_timestamp js-short-timestamp js-relative-timestamp")
soup.find('a', {'class': 'tweet-timestamp'})
soup.select("a.tweet-timestamp")
But none of these works every time.
My current code is:
data = requests.get(url)
soup = BeautifulSoup(data.text, 'html.parser')
gdata = soup.find_all("script")
for item in gdata:
items2 = item.find('a', {'class': 'tweet-timestamp js-permalink js-nav js-tooltip'}, href=True)
if items2:
items21 = items2.get('href')
items22 = items2.get('title')
print(items21)
print(items22)
I need to have output with the posting date.
I believe twitter API would be best choice but regaring your code....
It's available via title
attribute of element with class tweet-timestamp
. This element is not within a script
tag which seems to be where you are searching:
gdata = soup.find_all("script")
for item in gdata:
items2 = item.find('a', {'class': 'tweet-timestamp js-permalink js-nav js-tooltip'}, href=True)
Instead, select by the class direct:
data = requests.get(link)
soup = BeautifulSoup(data.text, 'html.parser')
tweets = soup.find_all('div' , {'class': 'content'})
for item in tweets:
items2 = item.find('a', {'class': 'tweet-timestamp js-permalink js-nav js-tooltip'}, href=True)
if items2:
items21 = items2.get('href')
items22 = items2.get('title')
print(items21)
print(items22.split('-')[1].strip())
I prefer css selectors and you only need one class out of the compound classes:
data = requests.get(link)
soup = BeautifulSoup(data.text, 'html.parser')
tweets = soup.select(".content")
for item in tweets:
items2 = item.select_one('.tweet-timestamp')
if items2:
items21 = items2.get('href')
items22 = items2.get('title')
print(items21)
print(items22.split('-')[1].strip())