Search code examples
pythonstringextract

Extract digits from string with consecutive digit characters


I cannot use Regular Expressions or library :(. I need to extract all digits from an alphanumeric string. Each consecutive sequence of digits (we can call "temperature") is precluded by a (+, -, or *) and will be considered as a single number (all are integers, no float). There are other non digit characters in the string that can be ignored. I need to extract each "temperature" into a data structure.

Example String "BARN21+77-48CDAIRY87+56-12" yields [21, 77, 48, 87, 56, 12]

The data string can be many many magnitudes larger.

All solutions I can find assume there is only 1 sequence of digits (temperature) in the string or that the (temperatures) are separated by a space/delimiter. I was able to get working by iterating through string and adding a space before and after each digit sequence and then using split but that feels like cheating. I wonder if you professionals distort data for a happy solution??

incoming data "BARN21+77-48CDAIRY87+56-12" temp is what I change data to

temp = "BARN* 21 + 77 - 48 DAIRY* 87 + 56 - 12"
result = [int(i)
for i in temp.split()
    if i.isdigit()]
    print("The result ", result)

The result [21, 77, 48, 87, 56, 12]


Solution

  • Here is a version which does not use regular expressions:

    inp = "BARN21+77-48CDAIRY87+56-12"
    inp = ''.join(' ' if not ch.isdigit() else ch for ch in inp).strip()
    nums = inp.split()
    print(nums)  # ['21', '77', '48', '87', '56', '12']
    

    If regex be available for you, we can use re.findall with the regex pattern \d+:

    inp = "BARN21+77-48CDAIRY87+56-12"
    nums = re.findall(r'\d+', inp)
    print(nums)  # ['21', '77', '48', '87', '56', '12']