Here's a section of HTML I'm trying to pull some info from:
<div class="pagination">
<p>
<span>Showing</span>
1-30
of 3744
<span>results</span>
</p>
</div>
I just want to store 3744
from the bit I pull (everything inside the <p>
), but I'm having a hard time since the of 3744
doesn't have any CSS styling and I don't understand XPaths at all :)
<span>Showing</span>1-30\nof 3744<span>results</span>
How would you parse the above string to only retrieve the total number of results?
As long as it always looks the same you could also use #scan
to get just the last number.
str = '<div class="pagination">
<p>
<span>Showing</span>
1-30
of 3744
<span>results</span>
</p>
</div>'
str.scan(/\d+/).pop.to_i
#=> 3744
Update Explanation of how it works
The scan
will pull an Array
of all the numbers e.g. ["1","30","3744"]
then it will pop
the last element from the Array
"3744"
and then convert that to an integer 3744
.
Please note that if the number you want is not the last element in the Array
then this will not work as you want e.g.
str = '<div class="pagination">
<p>
<span>Showing</span>
1-30
of 3744
<span>results 14</span>
</p>
</div>'
str.scan(/\d+/).pop.to_i
#=> 14
As you can see since I added the number 14 to the results span this is now the last number in the Array and your results are off. So you could modify it to something like this:
str.gsub(/\s+/,'').scan(/\d+-\d+of(\d+)/).flatten.pop.to_i
#=> 3744
What this will do is remove all spaces with gsub
then look for a pattern that equates to something along the lines of #{1,}-#{1,}of#{1,}
and capture the last group #=> [["3744"]]
then flatten
the Array
#=> ["3744"]
then pop
and convert to Integer
. This seems like a better solution as it will make sure to match the "of ####" section everytime.