Search code examples
c#.netregexstringtext-segmentation

how to extract a whole sentence by a single word match in a string?


So I have got a whole string (about 10k chars) and then searching for a word(or many words) in that string. With regex(word).Matches(scrappedstring).

But how to do so to extract the whole sentence, that contains that word. I was thinking of taking a substring after the searched word until the first dot/exclamation mark/question mark/etc. But how to take the part of the sentence before the searched word ?

Or maybe there's a better logic ?


Solution

  • If your boundaries are e.g. ., !, ? and ;, match all sentences across [^.!?;]*(wordmatch)[^.!?;]* expression. It will give all sentences with desired wordmatch inside.

    Example:

    var s = "First sentence. Second with wordmatch ? Third one; The last wordmatch, EOM!";
    var r = new Regex("[^.!?;]*(wordmatch)[^.!?;]*");
    var m = r.Matches(s);
    
    var result = Enumerable.Range(0, m.Count).Select(index => m[index].Value).ToList();