I have these kind of codes and i want to search for the "WOW" & the chinese which is both contained on each line.
Sample code:
Line: 1 <SENT>
Line: 2 <VALUE Lang="WOW">skip</VALUE>
Line: 3 </SENT>
Line: 4 <SENT>
Line: 5 <VALUE Lang="WOW">Mustang</VALUE>
Line: 6 </SENT>
Line: 7 <SENT>
Line: 8 <VALUE Lang="WOW">超級跑車雷文頓</VALUE>
Line: 9 </SENT>
Line: 10 <SENT>
Line: 11 <VALUE Lang="WOW">超級跑車雷文頓</VALUE>
Line: 12 </SENT>
Line: 13 <SENT>
Line: 14 <VALUE Lang="WOW">skip</VALUE>
Line: 15 </SENT>
Line: 16 <SENT>
Line: 17 <VALUE Lang="WOW">skip</VALUE>
Line: 18 </SENT>
Line: 19 <SENT>
Line: 20 <VALUE Lang="WOW">skip</VALUE>
Line: 21 </SENT>
Im using this code: [^\x00-\x7F]+
and i was able to retrieve the chinese/non-english texts. However, this time, i only wanted retrieve the non-english texts if it has Lang="WOW" on the same line.
So for example, using the code above with 21 lines, i need to be able to find Line: 8 & Line 11
Is it possible? Any clues and examples are greatly appreciated.
Use
Lang="WOW">\K[^\x00-\x7F]+
See regex proof.
EXPLANATION
--------------------------------------------------------------------------------
Lang="WOW"> 'Lang="WOW">'
--------------------------------------------------------------------------------
\K discard text matched so far
--------------------------------------------------------------------------------
[^\x00-\x7F]+ any character except: '\x00' to '\x7F' (1
or more times (matching the most amount
possible))