I want to know if there is a way to call both html_name()
and html_text
(from rvest
package) and store the two different results from inside the same pipe (magrittr::%>%
)
Here is an example:
uniprot_ac <- "P31374"
GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
content(as = "raw", content = "text/xml") %>%
read_html %>%
html_nodes(xpath = '//recommendedname/* |
//name[@type="primary"] | //comment[@type="function"]/text |
//comment[@type="interaction"]/text')
At this point I want to get both the tag names from html_name()
[1] "fullname" "ecnumber" "name" "text"
AND the tag content without having to create a separate object by rewriting the entire pipe to just change the last line to html_text()
[1] "Serine/threonine-protein kinase PSK1"
[2] "2.7.11.1"
[3] "PSK1"
[4] "Serine/threonine-protein kinase involved ... ...
Desired output can be something like this, either vector or data.frame doesn't matter
[1] fullname: "Serine/threonine-protein kinase PSK1"
[2] ecnumber: "2.7.11.1"
[3] Name: "PSK1"
[4] Text: "Serine/threonine-protein kinase involved ... ...
Maybe a bit of a hack, but you can use parenthesized anonymous functions in pipes:
library("magrittr")
library("httr")
library("xml2")
library("rvest")
uniprot_ac <- "P31374"
GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
content(as = "raw", content = "text/xml") %>%
read_html %>%
html_nodes(xpath = '//recommendedname/* |
//name[@type="primary"] | //comment[@type="function"]/text |
//comment[@type="interaction"]/text') %>%
(function(x) list(name = html_name(x), text = html_text(x)))
#$name
#[1] "fullname" "ecnumber" "name" "text"
#
#$text
#[1] "Serine/threonine-protein kinase PSK1"
#[2] "2.7.11.1"
#[3] "PSK1"
#[4] "Serine/threonine-protein kinase involved in the control of sugar metabolism and translation. Phosphorylates UGP1, which is required for normal glycogen and beta-(1,6)-glucan synthesis. This phosphorylation shifts glucose partitioning toward cell wall glucan synthesis at the expense of glycogen synthesis."
Alternatively, you might be able to do something more elegant with the purrr
package, but I don't see a reason why you want to load an entire package just to do that.
Edit
As noted by @MrFlick in the comments, the dot (.
) placeholder can do the same thing if properly put in curly brackets.
GET(paste0("https://www.uniprot.org/uniprot/", uniprot_ac, ".xml")) %>%
content(as = "raw", content = "text/xml") %>%
read_html %>%
html_nodes(xpath = '//recommendedname/* |
//name[@type="primary"] | //comment[@type="function"]/text |
//comment[@type="interaction"]/text') %>%
{list(name = html_name(.), text = html_text(.))}
This is arguably the more magrittr-idiomatic way of doing it, and it is actually documented in help("%>%")
.