Search code examples
regexregex-lookaroundslookbehindnegative-lookbehind

Regex to match specific format - One capital case word but not two


.+(?<![a-z]\s)([A-Z](?=[a-z'-]+)[a-z'-]+)(?!\s).+\((.+)\).+(?<![a-z]\s)([A-Z](?=[a-z'-]+)[a-z'-]+)(?!\s).+\((.+)\)

Cases:

[Nisei](slightly scratched) [Ocellaris](unconcious)
L: 1799 Ocellaris: (slightly scratched) vs. N'isei: (mildly wounded)
[Nisei](slightly scratched) [Zealot Warrior](perfect condition)
L: 1799 Ocellaris: (slightly scratched) vs. zealot warrior: (mildly wounded)
[fire dragon](slightly scratched) [Zealot Warrior](perfect condition)
[King Jheric](slightly scratched) [Zealot Warrior](perfect condition)

1 and 2 are supposed to match, but 3 and 6 shouldn't match. They have two words in the section previous to the ()'s. I tried to do a (?!\s) or (?!\b) to ignore the next word, but instead it seems to only backtrack to the previous character and ignore that.

Results

Case 1:
1: [1,6] Nisei
2: [8,26] slightly scratched
3: [29,38] Ocellaris
4: [40,57] unconcious
Case 2:
1: [8,17] Ocellaris
2: [20,38] slightly scratched
3: [44,50] N'isei
4: [53,67] mildly wounded
Case 3:
1: [1,6] Nisei
2: [8,26] slightly scratched
3: [29,34] Zealo
4: [45,62] perfect condition
Case 4:
No Match
Case 5:
No Match
Case 6:
1: [1,4] Kin
2: [14,32] slightly scratched
3: [35,40] Zealo
4: [51,68] perfect condition

Update:

General pattern would be

Person or NPC (condition) Person or NPC (condition)

Persons can only have a single capitalized name whereas an NPC can have two names with various capitalization... King Jheric vs wolfen berserker vs zealot warrior.

The reason it has to be vague is it has to match patterns like

Me:(condition) v Target:(condition) 
Reply:Some Person L:1200 King Jheric:(condition) vs. Target:(condition)
[Me] -> (condition) [wolfen berserker] -> (condition)
Lag: 1200 [zealot warrior](condition) vs. [King Jheric](condition)

Update 2:

(?<![a-z]|(?:\d+))([A-Z](?=[a-z'-]+)(?!.*\s\d+)[a-z'-]+).+\((.+)\).+(?<![a-z]|(?:\d+))([A-Z](?=[a-z'-]+)(?!.*\s\d+)[a-z'-]+).+\((.+)\)

This solves all listed cases above, including the original except for when the First or Second "thing" has two words and at least one of them is capitalized.


Solution

  • According to your information, this pattern should do the work:

    (?<![a-z'-] )([A-Z][a-z'-]++)[^(A-Z]*\(([^)]+)\)[^A-Z\v]+([A-Z][a-z'-]++)(?!\s[A-Z])[^(A-Z]*\(([^)]+)\)
    
    • (?<![a-z'-] ) is a negative lookbehind, to ensure that wen don't match lowercase letter or ' or - and a space (for the evil King Jhared)
    • ([A-Z][a-z'-]++) matches an uppercase letter followed by lowercase letters, apostrophe, hyphen - possesive so the engine doesn't try to step back
    • [^(A-Z]* matches any amount of characters that aren't opening brackets and uppercase letters (King Jhared, you remember) - maybe you could use [: ]* here, if you want to check
    • \(([^)]+)\) matches an opening brackets, one or more characters that aren't the closing bracket and then the closing bracket
    • [^A-Z\v]+ matches any character that is not an uppercase letter or a linebreak one or more times
    • ([A-Z][a-z'-]++) matches an uppercase letter followed by lowercase letters, apostrophe, hyphen - possesive so the engine doesn't try to step back
    • (?!\s[A-Z]) is a lookahead to ensure that it isn't followed by space and uppercase letter
    • [^(A-Z]* matches any amount of characters that aren't opening brackets and uppercase letters
    • \(([^)]+)\) matches an opening brackets, one or more characters that aren't the closing bracket and then the closing bracket

    You can find a demo with all your samples over here: https://regex101.com/r/nB5jP4/2