So I have the following strings (each string is a line of a .txt file) and I have built a parser to parse the first line as per follows:
line1: " N1 0.00000000 0.00000000 0.00000000 Type N Rank 4"
parser1 = Word(alphas + nums) + Word(printables + '.' + printables) + Word(printables + '.' + printables) \
+ Word(printables + '.' + printables) + Word(alphas) + Word(alphas) + Word(alphas) + Word(nums)
result = (['N1', '0.00000000', '0.00000000', '0.00000000', 'Type', 'N', 'Rank', '4'], {})
Which is great. However, after this line follows just floats which may, or may not have minus signs, for example:
line2 = " -1.064533
-0.000007 -0.130782 0.044770
0.335373 -0.000007 -0.000006 -0.451296 0.378061
-0.000034 -0.990753 -1.404081 -0.000067 -0.000150
-0.096208 -0.714299
-0.017676 0.000019 0.000034 0.804011 0.911492
0.000019 0.000027 0.441683 0.107567"
And I've tried using the following parser to grab these numbers, but it unfortunately also will grab line1 also:
parser2 = Word(printables + '.' + printables)
Is there a way to parse floats, that may include the printable minus sign in a better way?
Thanks a lot (I'm new to pyparsing so be as harsh as you like)
Word(printables + '.' + printables)
does not do what you are thinking. printables
is a string, so printables + '.' + printables
evaluates to a really long string containing all the printable characters, followed by a period, followed by all the printable characters again. This string is then used to construct a Word
object, which will match a space-delimited group of characters in the set of, well, all the printable characters (and since '.' is printable, then it is in that set already).
I suspect that what you really want to use to parse a real number with optional leading minus sign is something like
Optional('-') + Word(nums) + '.' + Word(nums)`
Note that the addition is done with the parse expressions, not the strings passed to Word. This will parse '-1.23' as ['-', '1', '.', '23']
. To get all that as a single string, wrap it in a Combine:
Combine(Optional('-') + Word(nums) + '.' + Word(nums))
Then you'll get '-1.23' from using that expression. It is still left to you afterward to convert that to a Python float using the float()
builtin.
pyparsing_common.real
is a pre-defined floating point parse expression that will handle leading signs, and convert from string to float at parse time, so that when you get the value from the parser, it is already converted to a float.