Search code examples
regexnsregularexpression

how to implement or after group in regex pattern


I want to get the thread-id from my urls in one pattern. The pattern should hat just one group (on level 1). My test Strings are:

https://www.mypage.com/thread-3306-page-32.html
https://www.mypage.com/thread-3306.html
https://www.mypage.com/Thread-String-Thread-Id

So I want a Pattern, that gives me for line 1 and 2 the number 3306 and for the last line "String-Thread-Id"

My current state is .*[t|T]hread-(.*)[\-page.*|.html]. But it fails at the end after the id. How to do it well? I also solved it like .*Thread-(.*)|.*thread-(\\w+).*, but this is with two groups not applicable for my java code.


Solution

  • Not knowing if this fits for all situations, but I would try this:

    ^.*?thread-((?:(?!-page|\.html).)*)
    

    In Java, that could look something like

    List<String> matchList = new ArrayList<String>();
    Pattern regex = Pattern.compile("^.*?thread-((?:(?!-page|\\.html).)*)", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE | Pattern.MULTILINE);
    Matcher regexMatcher = regex.matcher(subjectString);
    while (regexMatcher.find()) {
        matchList.add(regexMatcher.group(1));
    } 
    

    Explanation:

    ^                  # Match start of line
    .*?                # Match any number of characters, as few as possible
    thread-            # until "thread-" is matched.
    (                  # Then start a capturing group (number 1) to match:
     (?:               # (start of non-capturing group)
      (?!-page|\.html) # assert that neither "page-" nor ".html" follow
     .                 # then match any character
     )*                # repeat as often as possible
    )                  # end of capturingn group