I am have a csv file that has comma separated line something like this:
"22.0,asc1.0$*xyz,12.00,12.00023000,,1.0asc1.0,,-12.4442,0.00000,5.2E-5\n"
I want to find all floating point numbers only.
Challenges:
This expression correctly captures the float part but doesn't work with above scenarios correctly:
[-+]?[0-9]*\.?[0-9]+([eE][-+]?[0-9]+)?
Thanks in advance!
Complete Code in python (based on the feedback below):
import pandas as pd
from io import StringIO
data = {"a": 1.0, "b": "xyz.2.0", "c": "lol"}
buffer = StringIO()
df = pd.DataFrame([data])
df.to_csv(buffer, index=False)
input_ = buffer.getvalue()
print(input_)
import re
expression = "(?<=,|^)[+-]?\d+\.?\d*(?:E[+-]?\d+)?(?=,|$)"
def transform_float(m):
value = m.group()
print(value)
return value
result = re.sub(expression, transform_float, input_)
print(result)
You may use this regex with look arounds:
(?:(?<=,)|(?<=^))[+-]?\d+\.?\d*(?:E[+-]?\d+)?(?=,|$)
RegEx Description:
(?:(?<=,)|(?<=^))
: Lookbehind to assert that we have a comma or line start at previous position[+-]?\d+\.?\d*(?:E[+-]?\d+)?
: Match an integer or floating point number with optional E
part(?=,|$)
: Lookahead to assert that we have a comma or line end ahead