xidel -se '//strong[@class="n-heading"][1]/text()[1]' 'https://www.anekalogam.co.id/id'
will print out 3 same outputs
15 June 2020
15 June 2020
15 June 2020
so, what should I do in order to choose only 1 of them?
edit:
inside the strong class, the value looks like this:
15 June 2020
How to print only the "15 June 2020"?
Let me illustrate why this happens with the following example.
'test.htm':
<html>
<body>
<div>
<span>test1</span>
<span>test2</span>
<span>test3</span>
</div>
<div>
<span>test4</span>
</div>
<div>
<span>test5</span>
</div>
<div>
<span>test6</span>
</div>
</body>
</html>
xidel -s test.htm -e '//div[1]/span[1]'
test1
xidel -s test.htm -e '//span[1]'
test1
test4
test5
test6
xidel -s test.htm -e '(//span)[1]'
test1
In other words, you have to put the "strong"-node between parentheses:
xidel -s https://www.anekalogam.co.id/id -e '(//strong[@class="n-heading"])[1]/text()[1]'
This isn't needed if you grab the parent-node instead:
xidel -s https://www.anekalogam.co.id/id -e '//p[@class="n-smaller ngc-intro"]/strong/text()[1]'
[Bonus]
You've probably noticed already that the desired text-node spans 2 lines and ends with a
. To have xidel
return just "15 June 2020":
xidel -s https://www.anekalogam.co.id/id -e '//p[@class="n-smaller ngc-intro"]/strong/normalize-space(substring-before(text(),x:cps(160)))'
- x:cps()
is a shorthand for codepoints-to-string()
(and string-to-codepoints()
) and 160 is the codepoint for a "No-Break Space".
- text()[1]
isn't needed, because whenever you feed a sequence to a filter that expects a string, only the first item of that sequence will be used.