I have a .NET application that makes use of the .NET Regex features to match an EPL label text string. Normally I would use the following: ^[A-Z0-9,]+"(.+)"$ and it would match every line (it captures the text in-between the epl code). However recently the EPL has changed and at the end of every EPL line there is a line feed \x0D\x0A.
So i changed the code pattern to [((\r\n)|(\x0D\x0A))A-Z0-9,]+"(.+)" And now it only picks up the keep out of reach of children and doesn't recognise rest.
How can i match the text between the epl code??
This is the raw EPL i'm trying to match
N 0D0A A230,1,0,2,1,1,N,"Keep out of the reach of children"0D0A A133,26,0,4,1,1,N," FUROSEMIDE TABLETS 40 MG"0D0A A133,51,0,4,1,1,N," ONE IN THE MORNING"0D0A A133,76,0,4,1,1,N,""0D0A A133,101,0,4,1,1,N,""0D0A A133,126,0,4,1,1,N,""0D0A A133,151,0,4,1,1,N,""0D0A A133,176,0,4,1,1,N,"19/04/13 28 TABLET(S)"0D0A A133,201,0,4,1,1,N,"ELIZABETH M SMITH"0D0A LO133,232,550,40D0A A133,242,0,2,1,1,N,"Any Medical Centre,Blue Road"0D0A A133,260,0,2,1,1,N,"DN54 5TZ,Tel:01424 503901"0D0A P1
I think you're looking for the RegexOptions.Multiline option. As in:
Regex myEx = new Regex("^[A-Z0-9,]+\".+?\"$", RegexOptions.Multiline);
Actually, the regular expression should be:
"^[A-Z0-9,]+\".*\"\r?$"
Multiline
looks for the newline character, \n
. But the file contains \r\n
. So it finds the ending quote, sees the $
, and looks for the newline. But the file has Windows line endings (\r\n
). My modified regex skips over that character if it's there.
If you want to eliminate those characters in your results, make a capture group:
"^([A-Z0-9,]+\".*\")\r?$"
Or, you can filter them by calling Trim
on each result:
MatchCollection matches = myEx.Matches(text);
foreach (Match m in matches)
{
string s = m.Value.Trim(); // removes trailing \r
}