Search code examples
c#regexwpfrichtextboxflowdocument

Regex Split at beginning of line containing word


I'm trying to split a text into paragraphs each time a line contains a certain word. I already managed to split the text at the beginning of that word, but not at the beginning of the line containing that word. what's the right expression?

this is what I have

 string[] paragraphs = Regex.Split(text, @"(?=INT.|EXT.)");

I also want to lose any empty paragraphs in the array.

this is the input

INT. LOCATION - DAY 
Lorem ipsum dolor sit amet, consectetur adipiscing elit. 

LOCATION - EXT.
Morbi cursus dictum tempor. Phasellus mattis at massa non porta. 

LOCATION INT. - NIGHT

and I want to split it up keeping the same layout but just in paragraphs.

The result I have is

INT. LOCATION - DAY 
Lorem ipsum dolor sit amet, consectetur adipiscing elit. 

LOCATION - 

EXT.
Morbi cursus dictum tempor. Phasellus mattis at massa non porta. 

LOCATION 

INT. - NIGHT

The new paragraphs start at the word and not at the line.

This is the desired result

Paragraph 1

INT. LOCATION - DAY 
Lorem ipsum dolor sit amet, consectetur adipiscing elit. 

Paragraph 2

LOCATION - EXT.
Morbi cursus dictum tempor. Phasellus mattis at massa non porta. 

Paragraph 3

LOCATION INT. - NIGHT

The paragraph should always start at the beginning of the line containing the word INT. or EXT. not at the word.


Solution

  • Regex.Split(text, "(?=^.+?INT|^.+?EXT)", RegexOptions.Multiline);
    

    check this text scenario

    string text = "INT. LOCATION - DAY\n" +
                    "Lorem ipsum dolor sit amet, consectetur adipiscing elit.\n" +
                    "LOCATION - EXT.\n" +
                    "Morbi cursus dictum tempor. Phasellus mattis at massa non porta.\n" +
                    "LOCATION INT. - NIGHT\n";
    
                string[] res = Regex.Split(text, "(?=^.+?INT|^.+?EXT)", RegexOptions.Multiline);
    
                for (int i = 0; i < res.Count(); i++)
                {
                    int lineNumber = i + 1;   
                    Console.WriteLine("paragraph " + lineNumber + "\n"  + res[i]);
                }
    
    
    #paragraph 1
    #INT. LOCATION - DAY
    #Lorem ipsum dolor sit amet, consectetur adipiscing elit.
    
    #paragraph 2
    #LOCATION - EXT.
    #Morbi cursus dictum tempor. Phasellus mattis at massa non porta.
    
    #paragraph 3
    #LOCATION INT. - NIGHT