Search code examples
rpostscreen-scraping

extract results after Post query


I am trying to extract automatically electricity offers from this site.Once I set the postcode (i.e: 300) , I can download(manually) the pdf files

I am using httr package :

library(httr)
qr<- POST("http://www.qenergy.com.au/What-Are-Your-Options",
     query=list(postcode=3000))
res <- htmlParse(content(qr))

The problem is that the files urls are not in the query response. Any help please.


Solution

  • Try this

    library(httr)
    qr<- POST("http://www.qenergy.com.au/What-Are-Your-Options", 
              encode="form", 
              body=list(postcode=3000))
    res <- content(qr)
    pdfs <- as(res['//a[contains(@href, "pdf")]/@href'], "character")
    head(pdfs)
    # [1] "flux-content/qenergy/pdf/VIC price fact sheet jemena distribution zone business/Jemena-Freedom-Biz-5-Day-Time-of-Use-A210.pdf"  
    # [2] "flux-content/qenergy/pdf/VIC price fact sheet jemena distribution zone business/Jemena-Freedom-Biz-7-Day-Time-of-Use-A250.pdf"  
    # [3] "flux-content/qenergy/pdf/VIC price fact sheet jemena distribution zone business/Jemena-Freedom-Biz-Single-Rate-CL.pdf"          
    # [4] "flux-content/qenergy/pdf/VIC price fact sheet jemena distribution zone business/Jemena-Freedom-Biz-Single-Rate.pdf"             
    # [5] "flux-content/qenergy/pdf/VIC price fact sheet united energy distribution zone business/United-Freedom-Biz-5-Day-Time-of-Use.pdf"
    # [6] "flux-content/qenergy/pdf/VIC price fact sheet united energy distribution zone business/United-Freedom-Biz-7-Day-Time-of-Use.pdf"