I am trying to scrape java scripted objects from a webpage. I tried the JIRA API as suggested but I am not getting the activity log. I found a website explaining how java scripted objects can be scraped. For example, see below
https://datascienceplus.com/scraping-javascript-rendered-web-content-using-r/
I followed the example but I am finding it hard to understand what I need to send as xpath information to get the activity log listed. I am trying to scrape the activity log which is under the all-tab container in the bottom of webpage.
library(rvest)
library(V8)
#URL with js-rendered content to be scraped
link<- 'https://issues.apache.org/jira/browse/AMQCPP-645'
#Read the html page content and extract all javascript codes that are inside a list
#html<- getURL(link, followlocation = TRUE)
emailjs <- read_html(link) %>% html_nodes(xpath = "//div") %>% html_text()
ct <- v8()
#parse the html content from the js output and print it as text
read_html(ct$eval(gsub('document.write','',emailjs))) %>%
html_text()
I was hoping to get output like this:
rows emailjs
1 S A created issue - 25/Apr/19 15:48 Highlight in document.
2 Justin Bertram made changes - 25/Apr/19 17:53 Field Original Value
New
Value Comment [ I'm using Firefox, and it's working no problem. It's
just HTML so there shouldn't be any browser compatibility issues.
My guess is that Firefox is holding on to an older, cached version or
something. Try opening a "private browsing" window and trying it from
there. ] Highlight in document.
3 Timothy Bish made changes - 25/Apr/19 18:10 Resolution Fixed [ 1 ]
Status
Open [ 1 ] Closed [ 6 ] Highlight in document.
4 Timothy Bish made transition - 25/Apr/19 18:10 Open Closed 2h 22m 1
Suggestions would be greatly appreciated. Thank you!
You can mimic the POST request the page makes and add the one required header. Then html parse response for desired content. You may need to do a little more string tidying.
library(httr)
library(rvest)
library(magrittr)
headers = c('X-Requested-With' = 'XMLHttpRequest')
data = '[{"name":"jira.viewissue.tab.clicked","properties":{"inNewWindow":false,"keyboard":false,"context":"unknown","tab":"com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel","tabPosition":1},"timeDelta":-4904},{"name":"jira.viewissue.tab.clicked","properties":{"inNewWindow":false,"keyboard":false,"context":"unknown","tab":"com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel","tabPosition":0},"timeDelta":-4178}]'
rows <- read_html(httr::POST(url = 'https://issues.apache.org/jira/browse/AMQCPP-645?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel&_=1570029676497', httr::add_headers(.headers=headers), body = data))%>%
html_nodes('.issuePanelWrapper .issue-data-block')%>%
html_text()%>%
gsub('\\s+|\n+', ' ', .)