I am trying to figure out, first search the term within the specific tag (article tag) and then retrieve the value from that specific tag within the article tag.
I can retrieve the value from a specific tag,
<article>
<author>
<name>Example Name 1</name>
<title>example title 2</title>
</author>
<title>article title 1</title>
<publicationDate>2022-02-12</publicationDate>
<text>blah1 blah1 blah1</text>
<reference>10000</reference>
</article>
<article>
<author>
<name>Example Name 2</name>
<title>example title 2</title>
</author>
<title>article title 1</title>
<publicationDate>2022-02-13</publicationDate>
<text>blah1 blah1 blah1</text>
<reference>10001</reference>
</article>
xmllint --xpath "string(//title)" file.xml
But how can I search and then retrieve the value within the article tags? It will be each time a different reference number, then I need to extract the value from that specific reference.
Thank you for your help
If I understand your intention correctly, you should be able to parameterize your xpath search string using a bash variable containing the reference number that you are interested in. Note, that I modified your example XML to be wrapped in tags, so you will need to modify the xpath per your XML structure.
Script contents:
#!/bin/bash
ref_no=${1:-10001}
src_xml=${2:-/tmp/foo/s.xml}
title=$(xmllint --xpath "string(/articles/article[reference=${ref_no}]/title)" "${src_xml}")
printf "Reference: %s, Title: %s\n" "${ref_no}" "${title}"
Output:
$ ./script 10000
Reference: 10000, Title: article title 1
$ ./script 10001
Reference: 10001, Title: article title 2
For clarity, here is the test XML that I utilized:
<articles>
<article>
<author>
<name>Example Name 1</name>
<title>example title 2</title>
</author>
<title>article title 1</title>
<publicationDate>2022-02-12</publicationDate>
<text>blah1 blah1 blah1</text>
<reference>10000</reference>
</article>
<article>
<author>
<name>Example Name 2</name>
<title>example title 2</title>
</author>
<title>article title 2</title>
<publicationDate>2022-02-13</publicationDate>
<text>blah1 blah1 blah1</text>
<reference>10001</reference>
</article>
</articles>
Per the OP's question in the comments below, here is a variation if the is a string:
Script contents:
#!/bin/bash
ref_no=${1:-a10001}
src_xml=${2:-/tmp/s.xml}
title=$(xmllint --xpath "//*[reference=\"${ref_no}\"]/title/text()" "${src_xml}")
printf "Reference: %s, Title: %s\n" "${ref_no}" "${title}"
Note that you have to escape the double quotes surrounding the ${ref_no}
variable and then use the text()
function to extract the text from the element.
Further, note that the source XML's second <reference>
tag value was updated to 'a10001':
<reference>a10001</reference>
Output:
$ ./script a10001
Reference: a10001, Title: article title 2