Search code examples
pythontextpython-re

Extracting specific string format of digits


Let us suppose we have text like this :

text ="new notebook was sold 8 times before 13:30 in given shop"

here we have 3 number presented, one is single digit 8 and last two are two digit numbers, 13,30, main point is , that 13:30 express time, they are not just numbers , but they express information about hour and minute, so in order to make difference between 8 and 13:30, I want to return them as they are presented in string. For clarify problems if we just return 8, 13, 30 then it is not clear which one is hour, which one is minute and which one is just number

i have searched about regular expression tools and we can use for instance following line of codes:

import re
import string
text ="new notebook was sold  8 times before  13:30 in given shop"
x = re.findall("[0-5][0-9]", text)
y =re.findall("\d+",text)
print(x,y)

The first one returns two digit numbers (in this case 13 and 30) and second one returns all numbers (8,13,3) but how to return as they are presented in the list? so answer should be [8,13:30]?

here is one link which contains information about all regular expression options :regular expression

let us take one of the answer :

x = re.findall(r'\d+(?::\d+)?', text)

here d+ means match one or more digits., then comes

(?::\d+)?

? -means Zero or one occurrences,() is group option, for instance following syntax means

x = re.findall("falls|stays", txt)
#Check if the string contains either "falls" or "stays":

so this statement

 x = re.findall(r'\d+(?::\d+)?', text)

does it mean , that any digit following by one or : symbol and then following by digit again? what about 8?


Solution

  • x = re.findall(r'\d+(?::\d+)?', text)
    
    • \d+ one or more digits
    • (?: non-capturing group
    • ? optional

    Meaning digits optionally followed by a colon and digits.