[Name]
Jhon
[Age]
45
[MobileNumber]
1020304050
Billing address Delivery address
India India
I need to extract the text based on the above string.
static void Main(string[] args)
{
string strContent =
@"[Name]
Jhon
[Age]
45
[MobileNumber]
1020304050
Billing address Delivery address
GJ-India MH-India"
;
string value = string.Empty;
var match = Regex.Match(strContent, @"\[Name\]\s*(.*)", RegexOptions.Multiline);
if (match.Success)
{
value = match.Groups[1].Value;
}
Console.WriteLine(value); //Jhon
match = Regex.Match(strContent, @"\[Age\]\s*(.*)", RegexOptions.Multiline);
if (match.Success)
{
value = match.Groups[1].Value;
}
Console.WriteLine(value); //45
match = Regex.Match(strContent, @"\[MobileNumber\]\s*(.*)", RegexOptions.Multiline);
if (match.Success)
{
value = match.Groups[1].Value;
}
Console.WriteLine(value); //1020304050
match = Regex.Match(strContent, "Billing address (.*)", RegexOptions.Multiline);
Console.WriteLine(value); //India
match = Regex.Match(strContent, "Delivery address (.*)", RegexOptions.Multiline);
Console.WriteLine(value); //India
Console.ReadLine();
}
if I pass [Name]
then the result should be a "Jhon"
Similar to the Delivery address
: expected result: India
I've added the expected result in the comment.
But currently I'm getting India
for every field in the result.
[Age]
doesn't mean "the string Age", but "any of the 3 characters A, g, e". You have to put a backslash before the [
and ]
to match a litteral one (well, you'll put two backslashes before the [
and ]
, as you are inside a "
which itself asks for a backslash to escape the one you want to pass to the Regex)\[Age\] (.*)
would mean "the [Age] string followed by a space followed by the data line. You don't have any space after "[Age]" (but directly the end-of-line instead) so it won't match. Replace the space by a newline.Multiline
, as this will only change the meaning of ^
and $
that you don't use.match
variable, but then you WriteLine value
instead (which still has the value of the match on [Name]). Use match.Groups[1].Value
.So for the "[Field] Value" part, each one of your blocks will become:
match = Regex.Match(strContent, "\\[Age\\]\n(.*)");
Console.WriteLine(match.Groups[1].Value);
You can see the full solution (including the tabular part) in a fiddle.
I put it apart, 1. because it was not part of the question, 2. because it's way more complex, and 3. because it's my first C# program ever, so it lacks polishing, conciseness, best practice, and so on.
// Lookup for a known field, either at the start of a line, or after a field separator of at least 2 spaces.
var headerMatch = Regex.Match(strContent, "(?:^|.* )Billing address(?: .*|$)", RegexOptions.Multiline);
// Split the line to get the individual fields.
var fieldsMatches = Regex.Matches(headerMatch.Value+" ", "([^ ](?:[^ ]+| [^ ]+)*) +", RegexOptions.Multiline);
var fieldNames = fieldsMatches.Select(m => m.Value.Trim()).ToArray();
var fieldPos = fieldsMatches.Select(m => m.Index).ToArray();
var fieldLengths = fieldsMatches.Select(m => m.Length).ToArray();
// Get the lines following the header line, until an empty line or the end of the block.
var dataLines = Regex.Match(strContent.Substring(headerMatch.Index + headerMatch.Length), "(?:\n.+)*");
// For each line, loop to isolate individual fields.
var fieldVals = new Dictionary<string, string>();
foreach(var fieldName in fieldNames)
fieldVals.Add(fieldName, "");
foreach(Match line in Regex.Matches(dataLines.Value, ".+"))
{
var fieldNum = 0;
var ls = line.Value;
foreach(var fieldName in fieldNames)
{
var pos = fieldPos[fieldNum];
var length = fieldLengths[fieldNum];
string fragment = ls.Length <= pos ? "" : ls.Substring(pos, pos + length > ls.Length ? ls.Length - pos : length);
fragment = fragment.TrimEnd();
// For multiline field values, separate each segment from the previous with a newline,
// except if it starts with a space in which case it is just the wrapped tail of the previous (à la LDIF).
var concatenator = fieldVals[fieldName].Length > 0 && fragment.Length > 0 && fragment.Substring(0, 1) != " " ? "\n" : "";
fieldVals[fieldName] += concatenator+fragment;
++fieldNum;
}
}
foreach(var field in fieldVals)
Console.WriteLine(field.Key+": "+field.Value.Replace("\n", " <newline> "));