Search code examples
regexcomputation-theory

Regular Expression for numerical constant


In Sipser's book of Theory of computation, following is given :

a numerical constant that may include a fractional part and/or a sign may be described as a member of the language

( + U - U e) (D+ U D+ . D* U D* . D+)

where D = {0,1,2,3,4,5,6,7,8, 9} is the alphabet of decimal digits. Examples of generated strings are: 72, 3.14159, +7 ., and -.01 .

Here I can't understand what is the purpose of taking union of D+ or D* ? Moreover why 3rd dot is added?

Please someone clear my doubt.


Solution

  • It's trying to cover the following cases:

    5    #matched by D+
    .5   #matched by D*.D+
    5.   #matched by D+.D*
    5.5  #matched by both D*.D+ and D+.D*
    .    #not matched
    

    The . character in the expressions is the decimal separator. You can read the expression this way:

    ( + U - U e) ( (D+) U (D+ . D*) U (D* . D+) )