Search code examples
rrvest

How to download table from website using rvest


I want to download a station departure map of a website. However, I'm not really getting anywhere with web scraping the table. Can someone help me?

library(rvest)
library(tidyverse)

link <- "https://reiseauskunft.bahn.de/bin/bhftafel.exe/dn?ld=4391&protocol=https:&rt=1&"

html <- read_html(link)

enter image description here


Solution

  • The form values are transmitted as a POST request wich means that the values are not transmitted as part of the URL path, but rather in an encapsulated payload. Interestingly though when we click on "spaeter" a GET request is used, and we can see a URL with the parameters. We can use that URL to access the time table:

    library(rvest)
    library(tidyverse)
    
    link <- "https://reiseauskunft.bahn.de/bin/bhftafel.exe/dn?ld=4391&country=DEU&protocol=https:&rt=1&input=Erfurt%20Hbf%238010101&boardType=dep&time=18:23%2B60&productsFilter=11111&&&date=31.07.23&&selectDate=&maxJourneys=&start=yes"
    
    response_html <- read_html(link)
    
    response_html |> 
      html_table() |> 
      pluck(2)
    #> # A tibble: 24 × 6
    #>    Zeit   ``                 Zug          Richtung / Unterwegs…¹ Gleis Aktuelles
    #>    <chr>  <chr>              <chr>        <chr>                  <chr> <chr>    
    #>  1 früher ""                 ""           ""                     ""    ""       
    #>  2 18:44  "aktuelle Uhrzeit" "aktuelle U… ""                     ""    ""       
    #>  3 19:23  ""                 "FLX 1246"   "Berlin Hbf (tief)\n\… "10"  "19:34"  
    #>  4 19:28  ""                 "ICE  502"   "Hamburg-Altona\n\n\n… "9"   "19:39,G…
    #>  5 19:30  ""                 "ICE  273"   "Karlsruhe Hbf\n\n\n\… "2"   "Änderun…
    #>  6 19:32  ""                 "ICE  801"   "München Hbf\n\n\n\nE… "1"   "19:32"  
    #>  7 19:35  ""                 "RE     7(3… "Würzburg Hbf\n\n\n\n… "3a"  ""       
    #>  8 19:36  ""                 "RE    17(7… "Naumburg(Saale)Hbf\n… "4"   ""       
    #>  9 19:38  ""                 "RB    23(8… "Saalfeld(Saale)\n\n\… "6"   "19:38"  
    #> 10 19:38  ""                 "RB    46(8… "Ilmenau\n\n\n\nErfur… "6"   "19:38"  
    #> # ℹ 14 more rows
    #> # ℹ abbreviated name: ¹​`Richtung / Unterwegshaltestellen`
    

    Another option would be to use the Fahrplan API. You can request an API Key and there is an R package to interact with the Farhplan API: openbahn.