Search code examples
c#stringtrim

C# Trimming on both sides of the string all required characters and character sets in order


My input consists of a string of indeterminate length and content. I need to trim all spaces, characters # \t and fragments // /* */ at its beginning and end. All these symbols and character sets follow one another. If there was something else between them, you need to stop.

Example:

"///g, hhh/ , test" ===> "/g, hhh/ , test"

"//*g, hhh/ , test " ===> "*g, hhh/ , test"

"#/test" ===> "/test"

"hello //" ===> "hello"

Important clarification: I need to know how many characters I trimmed at the beginning and how many characters I trimmed at the end.

Also, I wouldn't want to use regular expressions because this code is very performance sensitive. As far as I know, regular expressions are pretty slow. However, if my task cannot be done with loops, or the like is extremely difficult - I am ready to use regular expressions.

So far I have tried code like this. The task is complicated by the fact that some symbols represent two characters following each other.

private const char specialSymbol_1 = '/';
private const char specialSymbol_2 = '*';
private const char specialSymbol_3 = '#';

private void ObserveTrim(ref string target, ref int start, ref int end) {
    int s = 0; int e = 0;
    
    for (int i = 0; i < target.Length; ++i) {
        char c = target[i];
        
        bool flag = false;
        if (target.Length > 1) {
            if (i == 0) {
                flag = c == specialSymbol_1 && target[i + 1] == specialSymbol_1 ||
                    c == specialSymbol_1 && target[i + 1] == specialSymbol_2 ||
                    c == specialSymbol_2 && target[i + 1] == specialSymbol_1;
            }
            else if (i == target.Length - 1) {
                flag = c == specialSymbol_1 && target[i - 1] == specialSymbol_1 ||
                    c == specialSymbol_1 && target[i - 1] == specialSymbol_2 ||
                    c == specialSymbol_2 && target[i - 1] == specialSymbol_1;
            }
            else {
                flag = c == specialSymbol_1 && target[i + 1] == specialSymbol_1 ||
                    c == specialSymbol_1 && target[i - 1] == specialSymbol_1 ||
                    c == specialSymbol_1 && target[i + 1] == specialSymbol_2 ||
                    c == specialSymbol_1 && target[i - 1] == specialSymbol_2;
            }
        }
        
        if (flag) continue;
        
        if (!char.IsWhiteSpace(c) && c != specialSymbol_3) {
            s = i;
            break;
        }
    }
    
    for (int i = target.Length - 1; i >= 0; --i) {
        char c = target[i];
        
        bool flag = false;
        if (target.Length > 1) {
            if (i == 0) {
                flag = c == specialSymbol_1 && target[i + 1] == specialSymbol_1 ||
                    c == specialSymbol_1 && target[i + 1] == specialSymbol_2 ||
                    c == specialSymbol_2 && target[i + 1] == specialSymbol_1;
            }
            else if (i == target.Length - 1) {
                flag = c == specialSymbol_1 && target[i - 1] == specialSymbol_1 ||
                    c == specialSymbol_1 && target[i - 1] == specialSymbol_2 ||
                    c == specialSymbol_2 && target[i - 1] == specialSymbol_1;
            }
            else {
                flag = c == specialSymbol_1 && target[i + 1] == specialSymbol_1 ||
                    c == specialSymbol_1 && target[i - 1] == specialSymbol_1 ||
                    c == specialSymbol_1 && target[i + 1] == specialSymbol_2 ||
                    c == specialSymbol_1 && target[i - 1] == specialSymbol_2;
            }
        }
        
        if (flag) continue;
        
        if (!char.IsWhiteSpace(c) && c != specialSymbol_3) {
            e = target.Length - 1 - i;
            break;
        }
    }
    
    start += s;
    end -= e;
    target = target.Substring(s, target.Length - s - e);
}

But this code doesn't work as expected. Example: "///g, hhh/ , test" ===> "g, hhh/ , test" "/*g, hhh/ , test" ===> "*g, hhh/ , test"

These are just a few examples of his incorrect work, in fact there are several dozen of them.

It simply does not take into account some characters or their sequence. I am not strong in such algorithms, and any help is welcome.


Solution

  • While having all strings to trim, I suggest check all of them with StartsWith and EndsWith. With a help of ReadOnlySpan<char> we can do it without creating many unwanted substrings.

    Since you want to get 3 parameters - result (trimmed string), how many symbols were removed from the left and from the right, let's combine them in a tuple:

    Code:

      public static (string result, int left, int right) MyTrim(
        string value, params string[] trim) {
    
        if (string.IsNullOrEmpty(value) || trim is null || trim.Length == 0)
          return (value, 0, 0);
    
        int trimmedLeft = 0;
        int trimmedRight = 0;  
          
        var span = value.AsSpan();
    
        for (bool keep = true; keep; ) {
          keep = false;
    
          foreach (var item in trim)
            if (!string.IsNullOrEmpty(item) && span.StartsWith(item)) {
              trimmedLeft += item.Length;   
              span = span.Slice(item.Length);
              keep = true;
    
              break;
            }
        }
    
        for (bool keep = true; keep; ) {
          keep = false;
    
          foreach (var item in trim)
            if (!string.IsNullOrEmpty(item) && span.EndsWith(item)) {
              trimmedRight += item.Length;   
              span = span.Slice(0, span.Length - item.Length);
              keep = true;
    
              break;
            }
        }
    
        return (span.ToString(), trimmedLeft, trimmedRight);
      } 
    

    Usage:

    string value = "///g, hhh/ , test";
    
    (string result, int left, int right) = MyTrim(
      value, " ", "#", "\t", "//", "/*", "*/");
    

    Demo:

    using System.Linq;
    
    ...
    
    string[] tests = new string[] {
      "///g, hhh/ , test",
      "//*g, hhh/ , test ", 
      "#/test",
      "hello  //",
    };
          
    var report = string.Join(Environment.NewLine, tests
      .Select(test => (test, result : MyTrim(test, " ", "#", "\t", "//", "/*", "*/")))                       
      .Select(pair => $"{pair.test,30} ===> {pair.result.result} (left: {pair.result.left}; right: {pair.result.right}) "));  
                    
    Console.WriteLine(report);
    

    Output:

                 ///g, hhh/ , test ===> /g, hhh/ , test (left: 2; right: 0) 
                //*g, hhh/ , test  ===> *g, hhh/ , test (left: 2; right: 1) 
                            #/test ===> /test (left: 1; right: 0) 
                         hello  // ===> hello (left: 0; right: 4) 
    

    Fiddle