I would like to extract a part of the string. Here is an example dataset.
df <- data.frame(id = c(1,2),
string = c('<itemResponse><response id="editIn_1.RESPONSE_1"><value>ETC_CHOICE_2</value>',
'<itemResponse><response id="editIn_1.RESPONSE_1"><value>ETC_CHOICE_4</value>'))
> df
id string
1 1 <itemResponse><response id="editIn_1.RESPONSE_1"><value>ETC_CHOICE_2</value>
2 2 <itemResponse><response id="editIn_1.RESPONSE_1"><value>ETC_CHOICE_4</value>
I would like to extract ETC_CHOICE_2
and ETC_CHOICE_4
from the long string. My desired output would be:
> df
id string extract
1 1 <itemResponse><response id="editIn_1.RESPONSE_1"><value>ETC_CHOICE_2</value> ETC_CHOICE_2
2 2 <itemResponse><response id="editIn_1.RESPONSE_1"><value>ETC_CHOICE_4</value> ETC_CHOICE_4
Does anyone have any idea?
Thanks!
An option is to use htmlParse
from XML
library(XML)
library(dplyr)
df %>%
mutate(extract = htmlParse(string) %>%
getNodeSet("//value") %>%
xmlValue)
-output
#id string extract
#1 1 <itemResponse><response id="editIn_1.RESPONSE_1"><value>ETC_CHOICE_2</value> ETC_CHOICE_2
#2 2 <itemResponse><response id="editIn_1.RESPONSE_1"><value>ETC_CHOICE_4</value> ETC_CHOICE_4