Search code examples
c#enumsassociationsmultilinestringherestring

Using enum names in a multiline string to associate each string line with the integer value of the enum. Is there a better way?


My RTF parser needs to process two flavors of rtf files (one file per program execution): rtf files as saved from Word and rtf files as created by a COTS report generator utility. The rtf for each is valid, but different. My parser uses regex patterns to detect, extract, and process the various rtf elements in the two types of rtf files.

I decided to implement the list of rtf regex patterns in two dictionaries, one for the rtf regex patterns needed for a Word rtf file and another for the rtf regex patterns needed for a COTS utility rtf file. At runtime, my parser detects which type of rtf file is being processed (Word rtf includes the rtf element //schemas.microsoft.com/office/word and the COTS rtf does not) and then obtains the needed regex pattern from the appopriate dictionary.

To ease the task of referencing the patterns as I write the code, I implemented an enum where each enum value represents a specific regex pattern. To ease the task of keeping the patterns in sync with their corresponding enum, I implemented the regex patterns as a here-string where each line is a csv composition: {enum name}, {word rtf regex pattern}, {cots rtf regex pattern}. Then, at run time when the patterns are loaded into their dictionaries, I obtain the int value of the enum from the csv and use it to create the dictionary key.

This makes writing the code easier, but I'm not sure it's the best way to implement and reference the rtf expressions. Is there a better way?

Example code:

public enum Rex {FOO, BAR};

string ex = @"FOO, word rtf regex pattern for FOO, cots rtf regex pattern for FOO
BAR, word rtf regex pattern for BAR, cots rtf regex pattern for BAR
";  

I load the dictionaries like this:

using (StringReader reader = new StringReader(ex))
{
    string line;

    while ((line = reader.ReadLine()) != null)
    {
        string[] splitLine = line.Split(',');
        int enumIntValue = (int)(Rex)Enum.Parse(typeof(Rex), splitLine[0].Trim());
        ObjWordRtfDict.Add(enumIntValue, line.Split(',')[1].Trim());
        ObjRtfDict.Add(enumIntValue, line.Split(',')[2].Trim());
    }
}

Then, at runtime, I access ObjWordRtfDict or ObjRtfDict based on the type of rtf file the parser detects.

string regExPattFoo = ObjRegExExpr.GetRegExPattern(ClsRegExExpr.Rex.FOO);

public string GetRegExPattern(Rex patternIndex)
{
    string regExPattern = "";

    if (isWordRtf)
    {
        ObjWordRtfDict.TryGetValue((int)patternIndex, out regExPattern);
    }
    else
    {
        ObjRtfDict.TryGetValue((int)patternIndex, out regExPattern);
    }

    return regExPattern;
}

Modified New code based on Asif's recommendations

I kept my enum for pattern names so references to pattern names can be checked by the compiler

Example csv file included as an embedded resource

SECT,^\\pard.*\{\\rtlch.*\\sect\s\}, ^\\pard.*\\sect\s\}
HORZ_LINE2, \{\\pict.*\\pngblip, TBD

Example usage

string sectPattern = ObjRegExExpr.GetRegExPattern(ClsRegExPatterns.Names.SECT);

ClsRegExPatterns class

using System.Collections.Generic;
using System.IO;
using System.Linq;
using System.Reflection;
using System.Text;
using System.Text.RegularExpressions;

namespace foo
{
    public class ClsRegExPatterns
    {
        readonly bool isWordRtf = false;
        List<ClsPattern> objPatternList;

        public enum Names { SECT, HORZ_LINE2 };

        public class ClsPattern
        {
            public string Name { get; set; }
            public string WordRtfRegex { get; set; }
            public string COTSRtfRegex { get; set; }
        }

        public ClsRegExPatterns(StringBuilder rawRtfTextFromFile)
        {
            // determine if input file is Word rtf or not Word rtf
            if ((Regex.Matches(rawRtfTextFromFile.ToString(), "//schemas.microsoft.com/office/word", RegexOptions.IgnoreCase)).Count == 1)
            {
                isWordRtf = true;
            }

            //read patterns from embedded content csv file
            string patternsAsCsv = new StreamReader((Assembly.GetExecutingAssembly()).GetManifestResourceStream("eLabBannerLineTool.Packages.patterns.csv")).ReadToEnd();

            //create list to hold patterns
            objPatternList = new List<ClsPattern>();

            //load pattern list
            using (StringReader reader = new StringReader(patternsAsCsv))
            {
                string line;

                while ((line = reader.ReadLine()) != null)
                {
                    string[] splitLine = line.Split(',');

                    ClsPattern objPattern = new ClsPattern
                    {
                        Name = splitLine[0].Trim(),
                        WordRtfRegex = splitLine[1].Trim(),
                        COTSRtfRegex = splitLine[2].Trim()
                    };

                    objPatternList.Add(objPattern);
                }
            }
        }

        public string GetRegExPattern(Names patternIndex)
        {
            string regExPattern = "";

            string patternName = patternIndex.ToString();

            if (isWordRtf)
            {
                regExPattern = objPatternList.SingleOrDefault(x => x.Name == patternName)?.WordRtfRegex;
            }
            else
            {
                regExPattern = objPatternList.SingleOrDefault(x => x.Name == patternName)?.COTSRtfRegex;
            }

            return regExPattern;
        }
    }
}

Solution

  • If I understand your problem statement correctly; I would rather prefer something like below.

    Create a class called RtfProcessor

     public class RtfProcessor
            {
                public string Name { get; set; }
                public string WordRtfRegex { get; set; }
                public string COTSRtfRegex { get; set; }
    
                void ProcessFile()
                {
                    throw new NotImplementedException();
                }
    
            }
    

    Where name signifies FOO or BAR etc. You can maintain a list of such files and keep populating from csv files like below

    List<RtfProcessor> fileProcessors = new List<RtfProcessor>();
                using (StringReader reader = new StringReader(ex))
                {
                    string line;
    
                    while ((line = reader.ReadLine()) != null)
                    {
                        string[] splitLine = line.Split(',');
                        RtfProcessor rtfProcessor = new RtfProcessor();
                        rtfProcessor.Name = splitLine[0].Trim();
                        rtfProcessor.WordRtfRegex = line.Split(',')[1].Trim();
                        rtfProcessor.WordRtfRegex = line.Split(',')[2].Trim();
                        fileProcessors.Add(rtfProcessor);
                    }
                }
    

    And to retrieve regex pattern for FOO or BAR

     // to get the regex parrtern for FOO you can use
      fileProcessors.SingleOrDefault(x => x.Name == "FOO")?.WordRtfRegex;
    

    hope this helps.