Search code examples
vbaexcelexcel-formulaexcel-2010excel-udf

Extract 5-digit number from one column to another


I need help with extracting 5-digit numbers only from one column to another in Excel 2010. These numbers can be in any position of the string (beginning of the string, anywhere in the middle, or at the end). They can be within brackets or quotes like:

(15478) or "15478" or '15478' or [15478]

I need to ignore any numbers that are less than 5 digits and include numbers that start with 1 or more leading zeros (like 00052, 00278, etc.) and ensure that leading zeros are copied over to the next column. Could someone help me with either creating a formula or UDF?


Solution

  • Here is a formula-based alternative that will extract the first 5 digit number found in cell A1. I tend to prefer reasonably simple formula solutions over VBA in most situations as formulas are more portable. This formula is an array formula and thus must be entered with Ctrl+Shift+Enter. The idea is to split the string up into every possible 5 character chunk and test each one and return the first match.

    =MID(A1,MIN(IF(NOT(ISERROR(("1"&MID(A1,ROW(INDIRECT("R1C[1]:R"&(LEN(A1)-4)&"C[1]",FALSE)),5)&".1")*1))*ISERROR(MID(A1,ROW(INDIRECT("R1C[1]:R"&(LEN(A1)-4)&"C[1]",FALSE))+5,1)*1)*ISERROR(MID(A1,ROW(INDIRECT("R1C[1]:R"&(LEN(A1)-4)&"C[1]",FALSE))-1,1)*1),ROW(INDIRECT("R1C[1]:R"&(LEN(A1)-4)&"C[1]",FALSE)),9999999999)),5)

    Let's break this down. First we have an expression I used twice to return an array of numbers from 1 up to 4 less than the length of your initial text. So if you have a string of length 10 the following will return {1,2,3,4,5,6}. Hereafter the below formula will be referred to as rowlist. I used R1C1 notation to avoid potential circular references.

    ROW(INDIRECT("R1C[1]:R"&(LEN(A1)-4)&"C[1]",FALSE))
    

    Next we will use that array to split the text into an array of 5 letter chunks and test each chunk. The test being performed is to prepend a "1" and append ".1" then verify the chunk is numeric. The prepend and append eliminate the possibility of white space or decimals. We can then check the character before and the character after to make sure they are not numbers. Hereafter the below formula will be referred to as isnumarray.

    NOT(ISERROR(("1"&MID(A1,rowlist,5)&".1")*1))
    *ISERROR(MID(A1,rowlist+5,1)*1)
    *ISERROR(MID(A1,rowlist-1,1)*1)
    

    Next we need to find the first valid 5 digit number in the string by returning the current index from a duplicate of the rowlist formula and returning a large number for non-matches. Then we can use the MIN function to grab that first match. Hereafter the below will be referred to as minindex.

    MIN(IF(isnumarray,rowlist,9999999999))
    

    Finally we need to grab the numeric string that started at the index returned by the MIN function.

    MID(A1,minindex,5)