Search code examples
c#textsplitedi

Split text file into array using multiple delimiters


I have some 2k files to compare block by block, but these blocks identified differently. What would be the best way to split each file into appropriate blocks list and then compare each block type to the same block type?

Block types:

  • HL*
  • EB*
  • SE*
  • GE*
  • IEA*

Example of the file- I added spaces for better readability, but files have NO spaces.

    useless-Text-useless-Text-~
    useless-Text-useless-Text-useless-Text-~

    HL*Block1'HL'text-Block1'HL'text-Block1'HL'text-Block1'HL'text-~
    Block1'HL'text-Block1'HL'text-~

    HL*Block2'HL'text-Block2'HL'text-~
    Block2'HL'text-Block2'HL'text-~

    HL*Block3'HL'text-Block3'HL'text-Block3'HL'text-~
    Block3'HL'text-~

    EB*Block1'EB'Text-Block1'EB'Text-Block1'EB'Text-~
    Block1'EB'Text-Block1'EB'Text-~
    Block1'EB'Text-Block1'EB'Text-~

    EB*Block2'EB'Text-Block2'EB'Text-Block2'EB'Text-~
    Block2'EB'Text-Block2'EB'Text-~
    Block2'EB'Text-Block2'EB'Text-~

    EB*Block3'EB'Text-Block3'EB'Text-Block3'EB'Text-~
    Block3'EB'Text-Block3'EB'Text-~
    Block3'EB'Text-Block3'EB'Text-~

    EB*Block4'EB'Text-Block4'EB'Text-Block4'EB'Text-~
    Block4'EB'Text-Block4'EB'Text-~
    Block4'EB'Text-Block4'EB'Text-~

    EB*Block_N'EB'Text-Block_N'EB'Text-Block_N'EB'Text-~
    Block_N'EB'Text-Block_N'EB'Text-~
    Block_N'EB'Text-Block_N'EB'Text-~

    SE*Block1'SE'Text-Block1'SE'Text-~
    Block1'SE'Text-~

    GE*Block1'GE'Text-~
    IEA*Block1'IEA'Text-~

Solution

  • I ended up parsing text files from the bottom-up and added those sections into Lists

            List<String> listHL_Base = new List<String>();
            List<String> listEB_Base = new List<String>();
            List<String> listSE_Base = new List<String>();
    
            StreamReader streamReader = new StreamReader(baseFile);
            string textBASE = streamReader.ReadToEnd().Trim();
            streamReader.Close();
    
            //we start scanning file from the button to up)           
            string[] textBASE_Step1 = Regex.Split(textBASE, "GE\\*");
            textBASE = textBASE_Step1[0];
    
            string[] textBASE_Step2 = Regex.Split(textBASE, "SE\\*");
            for (int i = 1; i < textBASE_Step2.Length; i++) //creating list with SE values
            {
                listSE_Base.Add(textBASE_Step2[i]);
            }
    
            textBASE = textBASE_Step2[0]; //remainder (beginning) of the file, without GE, or SE. Only  EB's and HL's Left
            string[] textBASE_Step3 = Regex.Split(textBASE, "EB\\*");
            for (int i = 1; i < textBASE_Step3.Length; i++) //creating list with EB values
            {
                listEB_Base.Add(textBASE_Step3[i]);
            }