Search code examples
regexgrepopenwrt

Grep behaves differently than Online RegExp-Tester


For a little Art Project I need to read stock prices from yahoo finance. The html-Source is quite complicated and long but using an online regexp-tester I figured out a regular expression that should result in the correct output.

<span class\=\"Trsdu\(0\.3s\) Fw\(500\) Pstart\(10px\) Fz\(24px\)(\"| C\(\$dataGreen\)\"| C\(\$dataRed\)\") data-reactid\=\"35\">([-+]{0,1}\d*\.\d*) \((([-+]{0,1})\d*\.\d*)\%\)<\/span>

Here is a bit of code with the result nested in it:

<svg class="D(n) Cur(p)" width="24" style="fill:#000;stroke:#000;stroke-width:0;vertical-align:bottom;" height="24" viewBox="0 0 24 24" data-icon="search" data-reactid="29"><path d="M9 3C5.686 3 3 5.686 3 9c0 3.313 2.686 6 6 6s6-2.687 6-6c0-3.314-2.686-6-6-6m13.713 19.713c-.387.388-1.016.388-1.404 0l-7.404-7.404C12.55 16.364 10.85 17 9 17c-4.418 0-8-3.582-8-8 0-4.42 3.582-8 8-8s8 3.58 8 8c0 1.85-.634 3.55-1.69 4.905l7.403 7.404c.39.386.39 1.015 0 1.403" data-reactid="30"></path></svg></div></div></div><div class="My(6px) Pos(r) smartphone_Mt(6px)" data-reactid="31"><div class="D(ib) Va(m) Maw(65%) Ov(h)" data-reactid="32"><div class="D(ib) Mend(20px)" data-reactid="33"><span class="Trsdu(0.3s) Fw(b) Fz(36px) Mb(-4px) D(ib)" data-reactid="34">11,541.87</span><span class="Trsdu(0.3s) Fw(500) Pstart(10px) Fz(24px) C($dataRed)" data-reactid="35">-402.83 (-3.37%)</span><div id="quote-market-notice" class="C($tertiaryColor) D(b) Fz(12px) Fw(n) Mstart(0)--mobpsm Mt(6px)--mobpsm" data-reactid="36"><span data-reactid="37">At close:  5:44PM CET</span></div></div><!-- react-empty: 38 --></div></div></div></div></div><script>if (window.performance) {window.performance.mark && window.performance.mark('Lead-3-QuoteHeader');window.performance.measure && window.performance.measure('Lead-3-QuoteHeaderDone','PageStart','Lead-3-QuoteHeader');}</script></div><div data-reactid="29">

My problem is: This regular expression does behave different in the online-tester than in egrep unter openwrt!

In the online-tester, it results in exactly this snippet:

<span class="Trsdu(0.3s) Fw(500) Pstart(10px) Fz(24px) C($dataGreen)" data-reactid="35">+50.32 (+0.17%)</span>

(With some additional groups marked because of the additional brackets in the regex)

If i use

egrep '<span class\=\"Trsdu\(0\.3s\) Fw\(500\) Pstart\(10px\) Fz\(24px\)(\"| C\(\$dataGreen\)\"| C\(\$dataRed\)\") data-reactid\=\"35\">([-+]{0,1}\d*\.\d*) \((([-+]{0,1})\d*\.\d*)\%\)<\/span>' stock.html

I get absolutely no result. OK, there must be an error in the regular expression. Let's start small:

egrep '<span class' stock.html

gives me many results.

egrep '<span class\=\"Trsdu\(0\.3s\) Fw\(500\)' stock.html

still results in some lines of code. But

egrep '<span class\=\"Trsdu\(0\.3s\) Fw\(500\) Pstart' stock.html

gives me nothing! Niente! Nada! Even

egrep '<span class\=\"Trsdu\(0\.3s\) Fw\(500\) ' stock.html

(Mind the blank space at the end of the regexp!) gives me no result. And I have no idea what the difference between

egrep '<span class\=\"Trsdu\(0\.3s\) Fw\(500\)' stock.html

and

egrep '<span class\=\"Trsdu\(0\.3s\) Fw\(500\) Pstart' stock.html

is in terms of regular expressions! If the blank space would be the problem, I should already get no results with the first blank space before "Fw". So why does my regexp fail with that second blank?


Solution

  • Most probably that particular empty space in front of Pstart is not a normal space, but a tabulation. You should also use POSIX character classes when using egrep. Try this:

    egrep '<span[[:space:]]+class\=\"Trsdu\(0\.3s\)[[:space:]]+Fw\(500\)[[:space:]]+Pstart\(10px\)[[:space:]]Fz\(24px\)(\"|[[:space:]]+C\(\$dataGreen\)\"|[[:space:]]+C\(\$dataRed\)\")[[:space:]]+data-reactid\=\"35\">([-+]{0,1}[[:digit:]]*\.[[:digit:]]*)[[:space:]]\((([-+]{0,1})[[:digit:]]*\.[[:digit:]]*)\%\)<\/span>' stock.html