Search code examples
sqlamazon-athenaalphanumeric

SQL - Extracting first 5 consecutive numbers from alphanumeric string


I am using AWS Athena, so functions are a bit limiting. But essentially I want to extract the first 5 consecutive and sequential numbers from a alphanumeric field.

enter image description here

From the first example, you can see it ignores the first 1 because there aren't 4 trailing numbers. I want to find and extract the first 5 numbers that are given together from this field. The output field is what I am hoping to achieve.


Solution

  • This will find an exact sequence of 5 digits.
    a sequence of less or more than 5 digits will be ignored.

    ^|\D  = Indication for the start of the text OR a non-digit character
    \d{5} = 5 digits
    \D|$  = A non-digit character OR indication for the end of the text
    

    with t (Example) as (values ('Ex/l/10345/Pl'), ('Ex/23453PlWL'), ('ID09456//'))
    select Example, regexp_extract(Example, '(^|\D)(\d{5})(\D|$)', 2) as Output
    from   t
    

    +---------------+--------+
    |    Example    | Output |
    +---------------+--------+
    | Ex/l/10345/Pl |  10345 |
    | Ex/23453PlWL  |  23453 |
    | ID09456//     |  09456 |
    +---------------+--------+