Search code examples
pythonexcelscreen-scrapingazure-blob-storage

Python: How can I put the information I am scraping into an excel doc or a blob so that I can compare them whenever my code runs?


I am attempting to scrape information on state supreme courts so that I can check when it changes. I am able to scrape and print the information successfully, but I am struggling to find a way to get it onto an excel doc or to another form of blob storage. Here is my current python code:

import requests
from bs4 import BeautifulSoup
import pandas as pd

list = ['https://ballotpedia.org/Alabama_Supreme_Court', 
'https://ballotpedia.org/Alaska_Supreme_Court', 'https://ballotpedia.org/Arizona_Supreme_Court', 
'https://ballotpedia.org/Arkansas_Supreme_Court', 'https://ballotpedia.org/California_Supreme_Court', 
'https://ballotpedia.org/Colorado_Supreme_Court', 
'https://ballotpedia.org/Connecticut_Supreme_Court', 
'https://ballotpedia.org/Delaware_Supreme_Court', 'https://ballotpedia.org/Florida_Supreme_Court']
for page in list:
    r = requests.get(page)
    soup = BeautifulSoup(r.content, 'html.parser')
    print([item.text for item in soup.select("table.wikitable.sortable.jquery-tablesorter a")])'

How can I get this into an excel doc or blob storage and refer to it later to check if information has changed. Thank you!


Solution

  • Made a few adjustments:

    import requests
    from bs4 import BeautifulSoup
    import pandas as pd
    
    list = ['https://ballotpedia.org/Alabama_Supreme_Court', 
    'https://ballotpedia.org/Alaska_Supreme_Court', 'https://ballotpedia.org/Arizona_Supreme_Court', 
    'https://ballotpedia.org/Arkansas_Supreme_Court', 'https://ballotpedia.org/California_Supreme_Court', 
    'https://ballotpedia.org/Colorado_Supreme_Court', 
    'https://ballotpedia.org/Connecticut_Supreme_Court', 
    'https://ballotpedia.org/Delaware_Supreme_Court', 'https://ballotpedia.org/Florida_Supreme_Court']
    
    temp_dict = {} #create empty dictionary
    
    for page in list:
        r = requests.get(page)
        soup = BeautifulSoup(r.content, 'html.parser')
    
        temp_dict[page.split('/')[-1]] = [item.text for item in soup.select("table.wikitable.sortable.jquery-tablesorter a")] #populate dictionary with state as key and the info as the value. 
    
    # The next line does the following: create dataframe from dictionary,
    # orient as 'index' (this handles different lengths of arrays) 
    # transpose it back so state supreme courts are column headers
    
    df = pd.DataFrame.from_dict(temp_dict, orient='index').transpose() 
    df.to_csv('State_Supreme_Court_Info.csv') #saves as csv