Search code examples
c#stringdata-manipulation

What's the fastest way to remove characters from an alpha-numeric string?


Say we have the following strings that we pass as parameters to the function below:

string sString = "S104";
string sString2 = "AS105";
string sString3 = "ASRVT106";

I want to be able to extract the numbers from the string to place them in an int variable. Is there a quicker and/or more efficient way of removing the letters from the strings than the following code?: (*These strings will be populated dynamically at runtime - they are not assigned values at construction.)

Code:

public GetID(string sCustomTag = null)
{
    m_sCustomTag = sCustomTag;
    try {
        m_lID = Convert.ToInt32(m_sCustomTag); }
        catch{
            try{
                int iSubIndex = 0;     
                char[] subString = sCustomTag.ToCharArray(); 

                //ITERATE THROUGH THE CHAR ARRAY
                for (int i = 0; i < subString.Count(); i++)     
                {
                    for (int j = 0; j < 10; j++)
                    {
                        if (subString[i] == j)
                        {
                            iSubIndex = i;
                            goto createID;
                        }
                    }
                }

            createID: m_lID = Convert.ToInt32(m_sCustomTag.Substring(iSubIndex));
            }
            //IF NONE OF THAT WORKS...
            catch(Exception e)
            {
                m_lID = 00000;
                throw e;
            }
         }
     }
 }

I've done things like this before, but I'm not sure if there's a more efficient way to do it. If it was just going to be a single letter at the beginning, I could just set the subStringIndex to 1 every time, but the users can essentially put in whatever they want. Generally, they will be formatted to a LETTER-then-NUMBER format, but if they don't, or they want to put in multiple letters like sString2 or sString3, then I need to be able to compensate for that. Furthermore, if the user puts in some whacked-out, non-traditional format like string sString 4 = S51A24;, is there a way to just remove any and all letters from the string?

I've looked about, and can't find anything on MSDN or Google. Any help or links to it are greatly appreciated!


Solution

  • You can use a Regex, but it's probably faster to just do:

    public int ExtractInteger(string str)
    {
        var sb = new StringBuilder();
        for (int i = 0; i < str.Length; i++)
            if(Char.IsDigit(str[i])) sb.Append(str[i]);
        return int.Parse(sb.ToString());
    }
    

    You can simplify further with some LINQ at the expense of a small performance penalty:

    public int ExtractInteger(string str)
    {
        return int.Parse(new String(str.Where(c=>Char.IsDigit(c)).ToArray()));
    }
    

    Now, if you only want to parse the first sequence of consecutive digits, do this instead:

    public int ExtractInteger(string str)
    {
        return int.Parse(new String(str.SkipWhile(c=>!Char.IsDigit(c)).TakeWhile(c=>Char.IsDigit(c)).ToArray()));
    }