Search code examples
regexweb-scrapingkodi

Exclude a combination of characters with regex or add a letter


I'm trying to adjust KODI's search filter with regex so the scrapers recognize tv shows from their original file names.

They either come in this pattern: "TV show name S04E01 some extra info" or this "TV show name 01 some extra info" The first is not recognized, because "S04" scrambles the search in a number of ways, this needs to go. The second is not recognized, because it needs an 'e' before numbers, otherwise, it won't be recognized as an episode number.

So I see two approaches.

  1. Make the filter ignore s01-99

  2. prepend an 'e' any freestanding two-digit numbers, but I worry if regex can even do that.

I have no experience in the regex, but I've been playing around coming up with this, which unsurprisingly doesn't do the trick

^(?!s{00,99})\d{2}$

Solution

  • You may either find \b([0-9]{2})\b regex matches and replace with E$1, or match \bs(0[1-9]|[1-9][0-9])\b pattern in an ignore filter.

    Details

    • \b([0-9]{2})\b - matches and captures into Group 1 any two digits that are not enclosed with letters, digits and _. The E$1 replacement means that the matched text (two digits) is replaced with itself (since $1 refers to the Group 1 value) with E prepended to the value.
    • \bs(0[1-9]|[1-9][0-9])\b - matches an s followed with number between 01 and 99 because (0[1-9]|[1-9][0-9]) is a capturing group matching either 0 and then any digit from 1 to 9 ([1-9]), or (|) any digit from 1 to 9 ([1-9]) and then any digit ([0-9]).

    NOTE: If you need to generate a number range regex, you may use this JSFiddle of mine.