Search code examples
pythonpandastwitterbeautifulsouppython-requests

How to print all results of Beautiful Soup at once?


I have a list of twitter usernames. I need to get their number of followers. I used BS and requests. However, I've only received one account every time.

from bs4 import BeautifulSoup
import requests
import pandas as pd
purcsv = pd.read_csv('pureeng.csv', engine= 'python')
followers = purcsv['username']
followers.head(10)

handle = purcsv['username'][0:40]
temp = ("https://twitter.com/"+handle)
temp = temp.tolist() 

for url in temp:
    page = requests.get(url)

bs = BeautifulSoup(page.text,'lxml')

follow_box = bs.find('li',{'class':'ProfileNav-item ProfileNav-item--followers'})
followers = follow_box.find('a').find('span',{'class':'ProfileNav-value'})
print("Number of followers: {} ".format(followers.get('data-count')))

Solution

  • That's because you are looping over the urls first and fetching the content for each in the same variable page here:

    for url in temp:
        page = requests.get(url)
    

    so page will always contain the last url page accessed, to solve this you need to process a page once fetched

    followers_list = []
    for url in temp:
        page = requests.get(url)
    
        bs = BeautifulSoup(page.text, "html.parser")
    
        follow_box = bs.find('li',{'class':'ProfileNav-item ProfileNav-item--followers'})
        followers = follow_box.find('a').find('span',{'class':'ProfileNav-value'})
        print("Number of followers: {} ".format(followers.get('data-count')))
        followers_list.append(followers.get('data-count'))
    print(followers_list)
    

    here is a full example to verify

    from bs4 import BeautifulSoup
    import requests
    import pandas as pd
    purcsv = pd.read_csv('pureeng.csv')
    
    followers = purcsv['username']
    
    handles = purcsv['username'][0:40].tolist()
    
    followers_list = []
    for handle in handles:
        url = "https://twitter.com/" + handle
        try:
            page = requests.get(url)
        except Exception as e:
            print(f"Failed to fetch page for url {url} due to: {e}")
            continue
    
        bs = BeautifulSoup(page.text, "html.parser")
    
        follow_box = bs.find('li',{'class':'ProfileNav-item ProfileNav-item--followers'})
        followers = follow_box.find('a').find('span',{'class':'ProfileNav-value'})
        print("Number of followers: {} ".format(followers.get('data-count')))
        followers_list.append(followers.get('data-count'))
    print(followers_list)
    

    output:

    Number of followers: 13714085 
    Number of followers: 4706511 
    ['13714085', '4706511']
    

    You may consider using async function for fetching and processing those urls if you have two many of them.