I'm scraping a bunch of website data and am able to print to terminal, but having real trouble pushing it directly to a sheet.
I can confirm Gspread connection is working - my code loops through the names, but doesn't touch the sheet.
I thought the last line of code (append) would place the results.
Ultimately, I just want to full result of my code in columns on the sheet, but having trouble.
If there's a better approach would love any guidance:
https://docs.google.com/spreadsheets/d/1TD4YmhfAsnSL_Fwo1lckEbnUVBQB6VyKC05ieJ7PKCw/edit#gid=0
import requests
from bs4 import BeautifulSoup
import gspread
gc = gspread.service_account(filename='creds.json')
sh = gc.open_by_key('1TD4YmhfAsnSL_Fwo1lckEbnUVBQB6VyKC05ieJ7PKCw')
worksheet = sh.sheet1
# AddValue = ["Test", 25, "Test2"]
# worksheet.insert_row(AddValue, 3)
def get_links(url):
data = []
req_url = requests.get(url)
soup = BeautifulSoup(req_url.content, "html.parser")
for td in soup.find_all('td', {'data-th': 'Player'}):
a_tag = td.a
name = a_tag.text
player_url = a_tag['href']
print(f"Getting {name}")
req_player_url = requests.get(
f"https://basketball.realgm.com{player_url}")
soup_player = BeautifulSoup(req_player_url.content, "html.parser")
div_profile_box = soup_player.find("div", class_="profile-box")
row = {"Name": name, "URL": player_url}
for p in div_profile_box.find_all("p"):
try:
key, value = p.get_text(strip=True).split(':', 1)
row[key.strip()] = value.strip()
except: # not all entries have values
pass
data.append(row)
return data
urls = [
'https://basketball.realgm.com/dleague/players/2022',
]
for url in urls:
print(f"Getting: {url}")
data = get_links(url)
for entry in data:
worksheet.append_row(entry)
data
of data = get_links(url)
is an array including JSON object. In the current stage, append_row
cannot directly use the JSON object. I thought that this is the reason for your current issue.append_row
is used in the loop. In this case, the process cost will become high.When these points are reflected in your script, how about the following modification?
for url in urls:
print(f"Getting: {url}")
data = get_links(url)
for entry in data:
worksheet.append_row(entry)
res = []
for url in urls:
print(f"Getting: {url}")
data = get_links(url)
res = [*res, *data]
if res != []:
header = list(res[0].keys())
values = [header, *[[e[k] if e.get(k) else "" for k in header] for e in res]]
worksheet.append_rows(values, value_input_option="USER_ENTERED")
append_rows
.header = list(res[0].keys())
. If you want to use your expected ordered headers, please manually put it to header = list(res[0].keys())
.