Search code examples
pythonregexscientific-notation

regex to interpret awkward scientific notation


Ok, so I'm working with this ENDF data, see here. Sometimes in the files they have what is quite possibly the most annoying encoding of scientific notation floating point numbers I have ever seen1. There it is often used that instead of 1.234e-3 it would be something like 1.234-3 (omitting the "e").

Now I've seen a library that simply changes - into e- or + into e+ by a simple substitution. But that doesn't work when some of the numbers can be negative. You end up getting some nonsense like e-5.122e-5 when the input was -5.122-5.

So, I guess I need to move onto regex? I'm open to another solution that's simpler but its the best I can think of right now. I am using the re python library. I can do a simple substitution where I look for [0-9]-[0-9] and replace that like this:

import re
str1='-5.634-5'
x = re.sub('[0-9]-[0-9]','4e-5',str1)
print(x)

But obviously this won't work generally because I need to get the numerals before and after the - to be what they were, not just something I made up... I've used capturing groups before but what would be the fastest way in this context to use a capturing group for the digits before and after the - and feed it back into the substitution using the Python regex library import re?

1 Yes, I know, fortran...80 characters...save space...punch cards...nobody cares anymore.


Solution

  • Probably wouldn't reach for regex for this, when some simple string ops should work:

    s.replace("-", "e-").replace("+", "e+").lstrip("e")