I have created a scrapy code that parses data on football fixtures. The code is working almost ok, but some reason the data scraped is not complete. For example on the url: https://www.fcf.cat/acta/2223/futbol-11/divisio-honor-cadet/grup-1/hc/barcelona-fc-a/hc/damm-cf-a
the json output does not include the goal scored in minute 34, and I cannot see why. Can anyone help tell me why?
case "Gols":
for row in table.css("tbody tr"):
#player_name = row.css("td a::text").get().strip()
player_name = player_name = row.xpath("string(td/a)").get().strip()
timestamp_acta = ""
tipus_gol = ""
if row.css(".faf-pilota_base.p-a.stat-center.gol-normal"):
tipus_gol = "Normal"
if row.css(".faf-pilota_base.p-a.stat-center.gol-propia"):
tipus_gol = "Propia"
if row.css(".faf-pilota_base.p-a.stat-center.gol-penal"):
tipus_gol = "Penal"
# special selector for 👇 selecting last of it's kind
timestamp = row.css("td:last-child::text").get()
table_data[player_name] = {
"Minut": timestamp,
"Tipus": tipus_gol}
case "Estadi":
table_data = []
table_data.append(
table.css("tr a::text").get()
)
table_data.append(
table.css("tr td.uppercase::text").get()
)
case "Comparativa":
team1 = response.css(".td-comparativa .comparativa-equip1 span::text").get()
team2 = response.css(".td-comparativa .comparativa-equip2 span::text").get()
table_data["Local"] = team1
table_data["Visitant"] = team2
dt[table_heading] = table_data
I would like to get someone to help understand why the code is skipping the row, so I can address the code. Being able to scrape all the data is really important.
The goal at minute 34 is not appearing in your results because you are storing the goal data in a dictionary keyed by player name. So if a player scores multiple goals, you will only record the last one that they score.
Maybe you want to think about storing a list of goals indexed by player name instead.
Instead of this:
table_data[player_name] = {
"Minut": timestamp,
"Tipus": tipus_gol}
You could use something like this:
goal_info = { "Minut": timestamp, "Tipus": tipus_gol}
if player_name in table_data:
table_data = { **table_data, player_name: table_data[player_name] + [goal_info]}
else:
table_data[player_name] = [goal_info]