Search code examples
c#.netstringtrim

What is the best way to trim all possible whitespace characters PLUS custom characters from the beginning of a string in C#?


In C#, I want to Trim() both ends of a string of all possible types of whitespace characters (as defined by .IsWhitespace), plus trim the beginning of the string to ALSO remove any character from a custom list in a char[] array, occurring before regular alphanumeric characters. So the string should end up such that the first character is not any whitespace and is not one of the characters in my custom list. (However, the string may, of course include whitespace or characters in my list AFTER the first character.)

I want to iterate through the string only one time to do all of this, because there is a large list of strings to trim.

The following does not work (see more comment below the code snippet).

    static class TrimmerExtension
    {
        private const string _prefixCharsToTrimAsString = "~`$|-_";
        private static readonly char[] _prefixCharsToTrim;

        // static ctor
        static TrimmerExtension()
        {
            _prefixCharsToTrim = _prefixCharsToTrimAsString.ToCharArray();
        }

        public static string TrimWhitespaceAndPrefixes(this string s)
        {
            return s.Trim().TrimStart(_prefixCharsToTrim).TrimStart();
        }
    }

The above does not work, because:

  1. it calls different versions of Trim several times, thus iterating through the string several times. (So the above is inefficient);

  2. it does not completely trim all whitespaces and special characters from the beginning if there are whitespace characters between other characters to be trimmed; for example, if the string contains " $ ~ Something ", the extension method would return "~ Something", but the first two characters still need to be removed. (So the above code is incorrect.)

Does C# or .NET provide an efficient way to do this (in terms of both performance and source code simplicity)? Alternatively, is there a class or library that defines a constant character (or string) array of all possible whitespace characters that the Trim() function would remove from a string, so I can add it to the list of my special prefix characters to be removed?

If this specific question is already answered, please post a link to the question and answer(s).

Thanks.


Solution

  • Looks like there's no easy way as string.Trim() internally uses two distinct methods for that.

    TrimWhiteSpaceHelper for whitespaces:

        private string TrimWhiteSpaceHelper(TrimType trimType)
        {
            // end will point to the first non-trimmed character on the right.
            // start will point to the first non-trimmed character on the left.
            int end = Length - 1;
            int start = 0;
    
            // Trim specified characters.
            if ((trimType & TrimType.Head) != 0)
            {
                for (start = 0; start < Length; start++)
                {
                    if (!char.IsWhiteSpace(this[start]))
                    {
                        break;
                    }
                }
            }
    
            if ((trimType & TrimType.Tail) != 0)
            {
                for (end = Length - 1; end >= start; end--)
                {
                    if (!char.IsWhiteSpace(this[end]))
                    {
                        break;
                    }
                }
            }
    
            return CreateTrimmedString(start, end);
        }
    

    and TrimHelper for custom chars:

    private unsafe string TrimHelper(char* trimChars, int trimCharsLength, TrimType trimType)
        {
            Debug.Assert(trimChars != null);
            Debug.Assert(trimCharsLength > 0);
    
            // end will point to the first non-trimmed character on the right.
            // start will point to the first non-trimmed character on the left.
            int end = Length - 1;
            int start = 0;
    
            // Trim specified characters.
            if ((trimType & TrimType.Head) != 0)
            {
                for (start = 0; start < Length; start++)
                {
                    int i = 0;
                    char ch = this[start];
                    for (i = 0; i < trimCharsLength; i++)
                    {
                        if (trimChars[i] == ch)
                        {
                            break;
                        }
                    }
                    if (i == trimCharsLength)
                    {
                        // The character is not in trimChars, so stop trimming.
                        break;
                    }
                }
            }
    
            if ((trimType & TrimType.Tail) != 0)
            {
                for (end = Length - 1; end >= start; end--)
                {
                    int i = 0;
                    char ch = this[end];
                    for (i = 0; i < trimCharsLength; i++)
                    {
                        if (trimChars[i] == ch)
                        {
                            break;
                        }
                    }
                    if (i == trimCharsLength)
                    {
                        // The character is not in trimChars, so stop trimming.
                        break;
                    }
                }
            }
    
            return CreateTrimmedString(start, end);
        }
    

    Combining them could look like that:

    public static class TrimmerExtension
    {
        public static string TrimWhitespaceAndPrefixes(this string str, params char[] trimChars)
        {
            var trimCharsLength = trimChars.Length;
            int start;
            for (start = 0; start < str.Length; start++)
            {
                var ch = str[start];
                if (!char.IsWhiteSpace(ch))
                {
                    int i;
                    for (i = 0; i < trimCharsLength; i++)
                    {
                        if (trimChars[i] == ch)
                        {
                            break;
                        }
                    }
    
                    if (i == trimCharsLength)
                    {
                        break;
                    }
                }
            }
    
            return str[start..];
        }
    }
    

    If performance is not very important, I suggest to use regular expressions or a shorthand version from https://stackoverflow.com/a/76133168/2770274