I have created a twitter scraper using the Beautiful Soup library. I have successfully managed to retrieve the Bio and the top tweet of a given user using their user name. The only issue I am having is that the output is a bit weird as the output is extracted from HTML code it contains many empty lines.
I have tried using prettify but all that returns is an empty line. I have also tried using pprint.pprint.
I am new to python and can't think of any other way to make the output of my script any neater
Any help would be greatly appreciated.
Below is my script:
import requests
from bs4 import BeautifulSoup
import pprint
q = "https://twitter.com"
def find_bio(username):
c = format("https://twitter.com"+"/" + username)
r = requests.get(c)
s = BeautifulSoup(r.text, "html.parser")
return s.find("div", class_="ProfileHeaderCard").text
def find_toptweet(username):
c = format("https://twitter.com"+"/" + username)
r = requests.get(c)
s = BeautifulSoup(r.text, "html.parser")
return s.find("div", class_="content").text
if __name__ == "__main__":
username = input('enter username: ')
bio = find_bio(username)
tweet = find_toptweet(username)
print("Bio--------------------------------------------------------------")
pprint.pprint(bio)
print("End of Bio-------------------------------------------------------")
print('top tweet')
pprint.pprint(tweet)
Output below
enter username: altifali4
Bio--------------------------------------------------------------------------------------
('\n'
'\n'
'Altif Ali\n'
'\n'
'\n'
'\n'
'@AltifAli4\n'
'\n'
'\n'
'People, by and large, are good people\n'
'\n'
'UoH\n'
'\n'
'\n'
'\n'
'\n'
'\n'
'\n'
'\n'
' \n'
' instagram.com/altif.ali\n'
' \n'
'\n'
'\n'
'\n'
'\n'
'Joined August 2018\n'
'\n'
'\n'
'\n'
' Born 1999\n'
'\n'
'\n'
'\n')
End of Bio---------------------------------------------------------------- ----------------------
top tweet
('\n'
'\n'
'\n'
'\n'
'\n'
'Lowkey\u200f\xa0@Lowkey0nline\n'
'\n'
'May 22\n'
'\n'
'\n'
'\n'
'\n'
'\n'
'\n'
'More\n'
'\n'
'\n'
'\n'
'\n'
'\n'
'\n'
'\n'
'\n'
'\n'
'Copy link to Tweet\n'
'\n'
'\n'
'Embed Tweet\n'
'\n'
'\n'
'\n'
'\n'
'\n'
'\n'
'\n'
'Power concedes nothing without demand. Without demand power concedes '
'nothing.\n')
Process finished with exit code 0
Try replacing your if
statement with the following one:
if __name__ == "__main__":
username = input('enter username: ')
bio = find_bio(username).replace("\n","")
tweet = find_toptweet(username).replace("\n","")
print("Bio--------------------------------------------------------------")
print(bio)
print("End of Bio-------------------------------------------------------")
print('top tweet')
print(tweet)
hope this helps