I am stuck with this problem : I am using R. I would like to remove the parent nodes "uid", "seanceRef" and "sessionRef".
I tried with remove_node()
but it does seem to work.
How can I do that?
<?xml version='1.0' encoding='UTF-8'?>
<compteRendu xmlns="http://schemas.assemblee-nationale.fr/referentiel">
<uid>CRSANR5L16S2022E1N003</uid>
<seanceRef>RUANR5L16S2022IDS26199</seanceRef>
<sessionRef>SCR5A2022E1</sessionRef>
<metadonnees>
I want to keep metadonnees
</metadonnees>
</compteRendu>
You are not including the xml namespace prefix in your xpath.
If we look at your original document:
library(xml2)
doc <- read_xml('test.xml')
doc
#> {xml_document}
#> <compteRendu xmlns="http://schemas.assemblee-nationale.fr/referentiel">
#> [1] <uid>CRSANR5L16S2022E1N003</uid>
#> [2] <seanceRef>RUANR5L16S2022IDS26199</seanceRef>
#> [3] <sessionRef>SCR5A2022E1</sessionRef>
#> [4] <metadonnees>\n I want to keep metadonnees\n </metadonnees>
Then we see the xml namespace is defined on the second line. All nodes belonging to this namespace have to be referred to by their namespace prefix, otherwise they will not be found:
xml_find_all(doc, '//uid')
#> {xml_nodeset (0)}
The default prefix is d1
, but we can check what it is by doing:
xml_ns(doc)
#> d1 <-> http://schemas.assemblee-nationale.fr/referentiel
So we can get the node(s) we want by doing:
remove_me <- xml_find_all(doc, '//d1:uid')
remove_me
#> {xml_nodeset (1)}
#> [1] <uid>CRSANR5L16S2022E1N003</uid>
And to remove this node we can do:
xml_remove(remove_me)
doc
#> {xml_document}
#> <compteRendu xmlns="http://schemas.assemblee-nationale.fr/referentiel">
#> [1] <seanceRef>RUANR5L16S2022IDS26199</seanceRef>
#> [2] <sessionRef>SCR5A2022E1</sessionRef>
#> [3] <metadonnees>\n I want to keep metadonnees\n </metadonnees>
Depending on your use case, you may find it easier to strip the namespace from your xml altogether to make the xpath easier to work with:
doc <- read_xml('test.xml')
xml_ns_strip(doc)
xml_find_all(doc, '//uid')
#> {xml_nodeset (1)}
#> [1] <uid>CRSANR5L16S2022E1N003</uid>