I am doing a project in which I needed to get tweets from twitter, and I used the twitter API but it only gives tweets from 7-9 days old but I want a few months older tweets as well. So I decided to scrape Twitter using Beautifulsoup and later selenium, but when parsing it is not returning the elements but rather the veiwsource of the entire webpage. Please help!!
import requests
from bs4 import Beautifulsoup
f=requests.get("https://twitter.com/search?q=%23......%20until%3A2020-02-07%20since%3A2020-01-01&src=typed_query").text
soup = BeautifulSoup(f,'html.parser')
print(soup)
name = soup.find_all('span', class_="css-901oao css-16my406 r-1qd0xha r-ad9z0x r-bcqeeo r-qvutc0")
print(name)
the output from printing soup....i don't how to say it but its the viewsource but not the actual html code
{"undefined"!=typeof Symbol&&Symbol.toStringTag&&Object.defineProperty(e,Symbol.toStringTag,{value:"Module"}),Object.defineProperty(e,"__esModule",{value:!0})},t.t=function(e,n){if(1&n&&(e=t(e)),8&n)return e;if(4&n&&"object"==typeof e&&e&&e.__esModule)return e;var d=Object.create(null);if(t.r(d),Object.defineProperty(d,"default",{enumerable:!0,value:e}),2&n&&"string"!=typeof e)for(var o in e)t.d(d,o,function(n){return e[n]}.bind(null,o));return d},t.n=function(e){var n=e&&e.__esModule?function(){return e.default}:function(){return e};return t.d(n,"a",n),n},t.o=function(e,n){return Object.prototype.hasOwnProperty.call(e,n)},t.p="https://abs.twimg.com/responsive-web/web/",t.oe=function(e){throw e};var i=window.webpackJsonp=window.webpackJsonp||[],c=i.push.bind(i);i.push=n,i=i.slice();for(var l=0;l<i.length;l++)n(i[l]);var u=c;d()}([]),window.__SCRIPTS_LOADED__.runtime=!0;
//# sourceMappingURL=runtime.cc3200a4.js.map
Selenium output in the same as well
from selenium import webdriver
PATH = "C:\\Program Files\\chromedriver.exe"
driver = webdriver.Chrome(PATH)
driver.get("https://twitter.com")
email = driver.find_element_by_name('session[username_or_email]')
password = driver.find_element_by_name('session[password]')
email.send_keys('......')
password.send_keys("......")
password.send_keys(Keys.RETURN)
time.sleep(1)
driver.get('https://twitter.com/search?q=%23....%20until%3A2020-02-07%20since%3A2020-01-01&src=typed_query')
time.sleep(1)
print(driver.page_source)
GetOldTweets3 enables you to extract historical tweets and filter based on multiple criteria i.e. time frame, location, handle, or search query without any API key prerequisites.
E.g.
import GetOldTweets3 as got
# Tweet params
search_term = 'china trade war'
start_date = '2017-01-01'
end_date = '2020-01-01'
# Define historical tweets criteria
tweet_criteria = got.manager.TweetCriteria().setUsername('reuters') \
.setQuerySearch(search_term) \
.setSince(start_date) \
.setUntil(end_date) \
# Return tweets based on tweet criteria
tweets = got.manager.TweetManager.getTweets(tweet_criteria)
tweets.text
Note that you can access further tweet attributes such as hashtags, retweets etc through the tweet
variable, for example:
other_tweet_attributes = [[tweet.username, tweet.hashtags for tweet in tweets]]