regex regex-group regex-negation regexp-replace

Javascript Regex to match expression except another expression

First I am going to describe what I am trying to achieve. I want to copy paste a list of football games (as a regular user, not as a dev, so the plain text from a website, not the html from inspecting the html), therefor I have to parse the text. So on the website it looks like this:

The pasted text would look like:

PERU\r\nLiga 2\r\nClasament Live\r\nFinal\r\nSanta Rosa\r\n\r\n0\r\n - \r\n3\r\n\r\nMolinos El Pirata\r\n(0 - 1)\r\n73 \r\nChavelines\r\n\r\n1\r\n - \r\n0\r\n\r\nDeportivo Coopsol\r\n(1 - 0)\r\n20:30\r\nComerciantes Unidos\r\n\r\n-\r\n\r\nJuan Aurich\r\n22:45\r\nSantos FC\r\n\r\n-\r\n\r\nHuaral\r\nPOLONIA\r\nEkstraklasa\r\nClasament Live\r\n90+1 \r\nPogon Szczecin\r\n\r\n2\r\n - \r\n0\r\n\r\nStal Mielec\r\n(1 - 0)\r\nlive\r\n20:30\r\nPlock\r\n\r\n-\r\n\r\nGornik Z.\r\nPORTUGALIA\r\nPrimeira Liga\r\nClasament\r\n21:15\r\nFarense\r\n\r\n-\r\n\r\nMaritimo

And what I need is then to build something like this:

Final     Santa Rosa            0 - 3  Molinos El Pirata
75        Chavelines            1 - 0  Deportivo Coopsol
20:30     Comerciantes Unidos     -    Juan Aurich
22:45     Santos FC               -    Huaral
90+3      Pogon Szczecin        2 - 0  Stal Mielec
20:30     Plock                   -    Gornik Z.
21:15     Farense                 -    Maritimo

So the plan is to extract each individual line into an array and then put them in a table. I am first cleaning up the text I don't need (the country names, league name, half time score:

gamesUnformatted = gamesUnformatted.replace(/\b[A-Z]{5,}\b/g, '['); // replace the country name (names with more than 4 letters, to avoid removing LASK, TSKA... but it will remove IRAN, ASIA - find better way) which is in capital letters with [
gamesUnformatted = gamesUnformatted.replace(/Clasament Live/g, ']');
gamesUnformatted = gamesUnformatted.replace(/Clasament/g, ']'); // replace the words Clasament with ]
gamesUnformatted = gamesUnformatted.replace(/ *\[[^\]]*]/g, ''); // remove everything between [ and ], including the square brackets
gamesUnformatted = gamesUnformatted.replace(/\(\d{1,2} - \d{1,2}\)/g, ''); // remove half time score eg (0 - 0)

And now I want to add the word newLine in front of every line, so that later I can just split by "newLine" and have all the independent lines in the array. And there are three scenarios for where a line starts: if the game didn't start (20:30), if the game has ended (Final) or if the game is running (eg 70). For the first two I have the following:

gamesUnformatted = gamesUnformatted.replace(/\d{2}:\d{2}/g, 'newLine$&'); // add the word newLine in front of the starting hours
gamesUnformatted = gamesUnformatted.replace(/Final/g, 'newLine$&'); // add the word newLine in front of the word Final (game has ended)

But the third one is more tricky. There can be 0-90, then with extra time 90+something (eg 90+3), and then can be two extra halves (eg 120, 120+..). So this is where I need some help. I need a regex that would match all these scenarios, but exclude others. To be more precise, I need to match the minute (1-120 and 1-120+...) but not the score or the hour (1-0, 20:30). and I have tried all sorts of things for half a day, can't list them all here, but have tried things with ^ and with ?: and with ! and what not. I must say I am not good with regex, so probably most of the things I have tried were silly, but ok, what I have at this moment is this:

gamesUnformatted = gamesUnformatted.replace(/\d{1,3}[^(\d{2}:\d{2})]/g, 'newLine$&');

This would be just the first step, to replace any number with 1 to 3 digits, not considering the "90+4". and trying to ignore the hours, not the scores. But this is not working well, because it is adding newLine in front of every digit. So this:

90+3      Pogon Szczecin        2 - 0 Stal Mielec
20:30     Plock                   - Gornik Z.

becomes this:

newLine90+newLine3      Pogon Szczecin        newLine2 - newLine0 Stal Mielec
newLinenewLine20:newLine30     Plock                   - Gornik Z.

instead of this (on the second row there are two newLine because one was added before to the hours, so that must be ignored):

newLine90+3      Pogon Szczecin        2 - 0 Stal Mielec
newLinenewLine20:30     Plock                   - Gornik Z.

Solution

Before adding the new line, you can do one more replacement to ensure the scores are in 1 line like 0-1

Demo: https://regex101.com/r/Dur5lD/4

Pattern: Match: (\d{1,2})\s*-\s*(\d{1,2}); Replacement: $1-$2

Explanation: Since I have newlines in the text, I have used \s to match space sequences. Used the capturing group $1 and $2 to get the desired output.

Once we have done this, adding newline should be straight forward.

Demo: https://regex101.com/r/Dur5lD/5

Pattern: ^((?:Final)|(?:\d{2}:\d{2})|(?:\d{1,3}(?!\d)(?!-)))

Explanation:

Capture a group that can be one of Final or hour or time.
For matching time, use negative look ahead, (?!). It implies the time value like 70 or 120 should not be followed by - or another digit.

Note:

I assumed \r\n as new line characters. If not we might need to replace \s and ^ characters in the expression with literal \r\n.
Looks like your regex is not handling PERU, so I manually removed the line.
After replacing \n with \t and then replacing newLine with \n yielded https://regex101.com/r/Dur5lD/6.