I want to download a station departure map of a website. However, I'm not really getting anywhere with web scraping the table. Can someone help me?
library(rvest)
library(tidyverse)
link <- "https://reiseauskunft.bahn.de/bin/bhftafel.exe/dn?ld=4391&protocol=https:&rt=1&"
html <- read_html(link)
The form values are transmitted as a POST
request wich means that the values are not transmitted as part of the URL path, but rather in an encapsulated payload. Interestingly though when we click on "spaeter" a GET request is used, and we can see a URL with the parameters. We can use that URL to access the time table:
library(rvest)
library(tidyverse)
link <- "https://reiseauskunft.bahn.de/bin/bhftafel.exe/dn?ld=4391&country=DEU&protocol=https:&rt=1&input=Erfurt%20Hbf%238010101&boardType=dep&time=18:23%2B60&productsFilter=11111&&&date=31.07.23&&selectDate=&maxJourneys=&start=yes"
response_html <- read_html(link)
response_html |>
html_table() |>
pluck(2)
#> # A tibble: 24 × 6
#> Zeit `` Zug Richtung / Unterwegs…¹ Gleis Aktuelles
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 früher "" "" "" "" ""
#> 2 18:44 "aktuelle Uhrzeit" "aktuelle U… "" "" ""
#> 3 19:23 "" "FLX 1246" "Berlin Hbf (tief)\n\… "10" "19:34"
#> 4 19:28 "" "ICE 502" "Hamburg-Altona\n\n\n… "9" "19:39,G…
#> 5 19:30 "" "ICE 273" "Karlsruhe Hbf\n\n\n\… "2" "Änderun…
#> 6 19:32 "" "ICE 801" "München Hbf\n\n\n\nE… "1" "19:32"
#> 7 19:35 "" "RE 7(3… "Würzburg Hbf\n\n\n\n… "3a" ""
#> 8 19:36 "" "RE 17(7… "Naumburg(Saale)Hbf\n… "4" ""
#> 9 19:38 "" "RB 23(8… "Saalfeld(Saale)\n\n\… "6" "19:38"
#> 10 19:38 "" "RB 46(8… "Ilmenau\n\n\n\nErfur… "6" "19:38"
#> # ℹ 14 more rows
#> # ℹ abbreviated name: ¹`Richtung / Unterwegshaltestellen`
Another option would be to use the Fahrplan API. You can request an API Key and there is an R package to interact with the Farhplan API: openbahn.