For a little Art Project I need to read stock prices from yahoo finance. The html-Source is quite complicated and long but using an online regexp-tester I figured out a regular expression that should result in the correct output.
<span class\=\"Trsdu\(0\.3s\) Fw\(500\) Pstart\(10px\) Fz\(24px\)(\"| C\(\$dataGreen\)\"| C\(\$dataRed\)\") data-reactid\=\"35\">([-+]{0,1}\d*\.\d*) \((([-+]{0,1})\d*\.\d*)\%\)<\/span>
Here is a bit of code with the result nested in it:
<svg class="D(n) Cur(p)" width="24" style="fill:#000;stroke:#000;stroke-width:0;vertical-align:bottom;" height="24" viewBox="0 0 24 24" data-icon="search" data-reactid="29"><path d="M9 3C5.686 3 3 5.686 3 9c0 3.313 2.686 6 6 6s6-2.687 6-6c0-3.314-2.686-6-6-6m13.713 19.713c-.387.388-1.016.388-1.404 0l-7.404-7.404C12.55 16.364 10.85 17 9 17c-4.418 0-8-3.582-8-8 0-4.42 3.582-8 8-8s8 3.58 8 8c0 1.85-.634 3.55-1.69 4.905l7.403 7.404c.39.386.39 1.015 0 1.403" data-reactid="30"></path></svg></div></div></div><div class="My(6px) Pos(r) smartphone_Mt(6px)" data-reactid="31"><div class="D(ib) Va(m) Maw(65%) Ov(h)" data-reactid="32"><div class="D(ib) Mend(20px)" data-reactid="33"><span class="Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)" data-reactid="34">11,541.87</span><span class="Trsdu(0.3s) Fw(500) Pstart(10px) Fz(24px) C($dataRed)" data-reactid="35">-402.83 (-3.37%)</span><div id="quote-market-notice" class="C($tertiaryColor) D(b) Fz(12px) Fw(n) Mstart(0)--mobpsm Mt(6px)--mobpsm" data-reactid="36"><span data-reactid="37">At close: 5:44PM CET</span></div></div><!-- react-empty: 38 --></div></div></div></div></div><script>if (window.performance) {window.performance.mark && window.performance.mark('Lead-3-QuoteHeader');window.performance.measure && window.performance.measure('Lead-3-QuoteHeaderDone','PageStart','Lead-3-QuoteHeader');}</script></div><div data-reactid="29">
My problem is: This regular expression does behave different in the online-tester than in egrep unter openwrt!
In the online-tester, it results in exactly this snippet:
<span class="Trsdu(0.3s) Fw(500) Pstart(10px) Fz(24px) C($dataGreen)" data-reactid="35">+50.32 (+0.17%)</span>
(With some additional groups marked because of the additional brackets in the regex)
If i use
egrep '<span class\=\"Trsdu\(0\.3s\) Fw\(500\) Pstart\(10px\) Fz\(24px\)(\"| C\(\$dataGreen\)\"| C\(\$dataRed\)\") data-reactid\=\"35\">([-+]{0,1}\d*\.\d*) \((([-+]{0,1})\d*\.\d*)\%\)<\/span>' stock.html
I get absolutely no result. OK, there must be an error in the regular expression. Let's start small:
egrep '<span class' stock.html
gives me many results.
egrep '<span class\=\"Trsdu\(0\.3s\) Fw\(500\)' stock.html
still results in some lines of code. But
egrep '<span class\=\"Trsdu\(0\.3s\) Fw\(500\) Pstart' stock.html
gives me nothing! Niente! Nada! Even
egrep '<span class\=\"Trsdu\(0\.3s\) Fw\(500\) ' stock.html
(Mind the blank space at the end of the regexp!) gives me no result. And I have no idea what the difference between
egrep '<span class\=\"Trsdu\(0\.3s\) Fw\(500\)' stock.html
and
egrep '<span class\=\"Trsdu\(0\.3s\) Fw\(500\) Pstart' stock.html
is in terms of regular expressions! If the blank space would be the problem, I should already get no results with the first blank space before "Fw". So why does my regexp fail with that second blank?
Most probably that particular empty space in front of Pstart
is not a normal space, but a tabulation. You should also use POSIX character classes when using egrep. Try this:
egrep '<span[[:space:]]+class\=\"Trsdu\(0\.3s\)[[:space:]]+Fw\(500\)[[:space:]]+Pstart\(10px\)[[:space:]]Fz\(24px\)(\"|[[:space:]]+C\(\$dataGreen\)\"|[[:space:]]+C\(\$dataRed\)\")[[:space:]]+data-reactid\=\"35\">([-+]{0,1}[[:digit:]]*\.[[:digit:]]*)[[:space:]]\((([-+]{0,1})[[:digit:]]*\.[[:digit:]]*)\%\)<\/span>' stock.html