Search code examples
c#sql-serverstring-parsing

Parsing a SQL Query Report with Bizarrely Formatted Entries


I am currently writing a C# program that parses a SQL Server query report for data that it then saves for later use. So far, the queries I've been parsing have been very straightforward, but I just received a test file that contains this entry:

Client Contact                                                                                                                                                                                                                                                   Client Email
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Primary: Nelli Quiroga

, nelli.quiroga@parservicios.com 
Secondary: Zully Aranda, zully.aranda@parservicios.com
                                                                                                                                                nelli.quiroga@parservicios.com|zully.aranda@parservicios.com

(Please excuse the formatting - that's exactly how the query outputs it.)

I need to be able to pull the names and email addresses of both the Primary and Secondary clients, which will then be stored in a List<Tuple<string, string>>. So far, I haven't been able to devise a solution that does it efficiently, if at all. My best guess is that I would have to use Regex somehow, but I'm not familiar with how Regex works in C#, and my regex skills are very rusty.

Some additional information:

  • I do not have access to the database, so I can't change the formatting.
  • I do not know whether or not the query result can/will contain more than one such entry. Because of this, I would like for the parser to handle more than one of these entries if present.
  • This entry(ies) will be part of a query report that contains other table queries, which thankfully are all formatted like so:

    User_Name                                          Group_Name
    -------------------------------------------------- --------------------------------------------------
    msteffl                                            NULL
    spurcell                                           Admin
    wgervais                                           Admin
    djames                                             Admin
    loaduser                                           Admin
    mbreu                                              Admin
    wgervais                                           ComplianceContractors
    wgervais                                           ComplianceOfficers
    sgregory                                           ComplianceOfficers
    ntabares                                           ComplianceOfficers
    lduffaut                                           ComplianceOfficers
    pdeneree                                           ComplianceOfficers
    serickson                                          ComplianceOfficers
    mbreu                                              ComplianceOfficers
    jreinhardt                                         ComplianceOfficers
    jromoser                                           ComplianceOfficers
    ebell                                              ComplianceOfficers
    bkeogh                                             ComplianceOfficers
    cbarnett                                           ComplianceOfficers
    cbarnett                                           Users
    bkeogh                                             Users
    ebell                                              Users
    djames                                             Users
    jromoser                                           Users
    jreinhardt                                         Users
    mbreu                                              Users
    serickson                                          Users
    pdeneree                                           Users
    lduffaut                                           Users
    ntabares                                           Users
    sgregory                                           Users
    spurcell                                           Users
    wgervais                                           Users
    

Any and all input would be appreciated.


Solution

  • Alright, after some trial and error, I was able to write a regular expression that works just fine with the contact output I was given. I cannot guarantee that it will work well with any output file I can/will be given, but for now it does what I need it to do.

    This is the regular expression in question:

    (?:Primary|Secondary):[ ]*[\r\n]*[a-zA-Z ]+[\r\n]*[ ]*,[ ]*[\r\n]*[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-z]+[ ]*[\r\n]*

    It is able to break up the entry into two parts, the Primary and Secondary contact, which then allows me to trim and split each contact as needed.

    Again, I cannot say for certain that this will work with any given input, but I also think I've managed to make it reasonably foolproof, hence all the instances of [ ]* and [\r\n]*.