I was trying to scrape the "entry-title" of the last news on the site "https://www.abafg.it/category/avvisi/" and prints [ ] instead, what am i doing the wrong way?
I tried to scrape the class "entry-title" to let me save the title, the link of where that news leads and the date of publish
The entry-title
class is not of the link a
tag, but of the h2
wrapped around it.
You can use
names = [h.a for h in soup.find_all('h2', class_='entry-title')]
But I think using CSS selectors would be better here
names = soup.select('h2.entry-title > a[href]')
will select any a
tag with a href
attribute and with a h2
parent of class entry-title
.
Then,
for a in names: print(a.get_text().strip(), a.get('href'))
will print
AVVISO LEZIONI DI SCULTURA : PROF.BORRELLI https://www.abafg.it/avviso-lezioni-di-scultura-prof-borrelli/
ORARIO DELLE LEZIONI A.A.2022/2023 IN VIGORE DAL 21 NOVEMBRE 2022 https://www.abafg.it/orario-delle-lezioni-a-a-2022-2023-in-vigore-dal-21-novembre-2022/
PROROGA BANDO AFFIDAMENTI INTERNI D.D. N. 3 DEL 4.11.2022 https://www.abafg.it/proroga-bando-affidamenti-interni-d-d-n-3-del-4-11-2022/
D.D. n. 7 del 15.11.2022 DECRETO GRADUATORIA PROVVISORIA ABPR19 https://www.abafg.it/d-d-n-7-del-15-11-2022-decreto-graduatoria-provvisoria-abpr19/
D.D. n. 5 DEL 10.11.2022 DECRETO DI NOMINA COMMISSIONE ABPR19 https://www.abafg.it/d-d-n-5-del-10-11-2022-decreto-di-nomina-commissione-abpr19/
RIAPERTURA BANDO AFFIDAMENTI INTERNI D.D. N. 3 DEL 4.11.2022 https://www.abafg.it/riapertura-bando-affidamenti-interni-d-d-n-4-del-4-11-2022/
D.D.81 del 26.10.2022 GRADUATORIA DEFINITIVA ABST48 STORIA DELLE ARTI APPLICATE https://www.abafg.it/d-d-81-del-26-10-2022-graduatoria-definitiva-abst48-storia-delle-arti-applicate/
AVVISO PRESENTAZIONE DOMANDE CULTORE DELLA MATERIA A.A.22.23-SCADENZA 11.11.2022 https://www.abafg.it/avviso-presentazione-domande-cultore-della-materia-a-a-22-23-scadenza-11-11-2022/
D.D. N.78 DEL 19/10/2022 BANDO GRADUATORIE D’ISTITUTO-SCADENZA 9/11/2022. https://www.abafg.it/d-d-n-78-bando-graduatorie-distituto-scadenza-9-11-2022/
ORARIO PROVVISIORIO DELLE LEZIONI A.A. 2022/2023: TRIENNIO E BIENNIO https://www.abafg.it/orario-provvisiorio-delle-lezioni-a-a-2022-2023-triennio-e-biennio/
Added EDIT: to save the printed text into a file, you could first save it as one string with .join
first
asText = '\n'.join([f'{a.get_text().strip()} {a.get("href")}' for a in names])
and then you could save it with
with open('./resources/titles.txt', 'w', encoding='utf-8') as f:
f.write(asText)
If you want something more visuals-friendly, I suggest using pandas
asDF = pandas.DataFrame([{
'title': a.get_text().strip(), 'link': a.get('href')
} for a in names])
asText = asDF.to_markdown(index=False)
and now asText
looks like
| title | link |
|:---------------------------------------------------------------------------------|:-------------------------------------------------------------------------------------------------------|
| ORARIO DELLE LEZIONI A.A.2022/2023 IN VIGORE DAL 21 NOVEMBRE 2022 | https://www.abafg.it/orario-delle-lezioni-a-a-2022-2023-in-vigore-dal-21-novembre-2022/ |
| PROROGA BANDO AFFIDAMENTI INTERNI D.D. N. 3 DEL 4.11.2022 | https://www.abafg.it/proroga-bando-affidamenti-interni-d-d-n-3-del-4-11-2022/ |
| D.D. n. 7 del 15.11.2022 DECRETO GRADUATORIA PROVVISORIA ABPR19 | https://www.abafg.it/d-d-n-7-del-15-11-2022-decreto-graduatoria-provvisoria-abpr19/ |
| D.D. n. 5 DEL 10.11.2022 DECRETO DI NOMINA COMMISSIONE ABPR19 | https://www.abafg.it/d-d-n-5-del-10-11-2022-decreto-di-nomina-commissione-abpr19/ |
| RIAPERTURA BANDO AFFIDAMENTI INTERNI D.D. N. 3 DEL 4.11.2022 | https://www.abafg.it/riapertura-bando-affidamenti-interni-d-d-n-4-del-4-11-2022/ |
| D.D.81 del 26.10.2022 GRADUATORIA DEFINITIVA ABST48 STORIA DELLE ARTI APPLICATE | https://www.abafg.it/d-d-81-del-26-10-2022-graduatoria-definitiva-abst48-storia-delle-arti-applicate/ |
| AVVISO PRESENTAZIONE DOMANDE CULTORE DELLA MATERIA A.A.22.23-SCADENZA 11.11.2022 | https://www.abafg.it/avviso-presentazione-domande-cultore-della-materia-a-a-22-23-scadenza-11-11-2022/ |
| D.D. N.78 DEL 19/10/2022 BANDO GRADUATORIE D’ISTITUTO-SCADENZA 9/11/2022. | https://www.abafg.it/d-d-n-78-bando-graduatorie-distituto-scadenza-9-11-2022/ |
| ORARIO PROVVISIORIO DELLE LEZIONI A.A. 2022/2023: TRIENNIO E BIENNIO | https://www.abafg.it/orario-provvisiorio-delle-lezioni-a-a-2022-2023-triennio-e-biennio/ |
| GRADUATORIA DEFINITIVA ABST47 STILE,STORIA DELL’ARTE E DEL COSTUME | https://www.abafg.it/graduatoria-definitiva-abst47-stilestoria-dellarte-e-del-costume/ |
And then, instead of TXT, you could also save it as CSV with
asDF.to_csv('./resources/titles.csv', index=False)