#include <iostream>
#include <fstream>
#include <string>
#include <algorithm>
#include <vector>
#include <iomanip>
#include <cmath>
using namespace std;
#define LIKELY(expr) (__builtin_expect(!!(expr), 1))
#define UNLIKELY(expr) (__builtin_expect(!!(expr), 0))
bool abscmp(double a, double b)
{
if (isnan(a) && isnan(b)) return true;
if (isnan(a) ^ isnan(b)) return false;
return a == b;
}
template <typename T>
inline __attribute__((always_inline)) T ParseFloat(const char *a) {
static constexpr T multers[] = {
0.1, 0.01, 0.001, 0.0001, 0.00001, 0.000001, 0.0000001, 0.00000001, 0.000000001, 0.0000000001, 0.00000000001,
0.000000000001, 0.0000000000001, 0.00000000000001, 0.000000000000001, 0.0000000000000001, 0.00000000000000001
};
static_assert(std::is_floating_point_v<T>);
int i = (a[0] == '-') | (a[0] == '+');
T res = 0.0;
int sign = 1 - 2 * (a[0] == '-');
if (UNLIKELY(!a[0])) return NAN;
while (a[i] && a[i] != '.') {
if (UNLIKELY(a[i] < '0' || a[i] > '9')) {
return NAN;
}
res = res * static_cast<T>(10.0) + a[i] - '0';
i++;
}
if (LIKELY(a[i] != '\0')) {
i++;
int j = i;
//T mult = 0.1;
while (a[i]) {
if (UNLIKELY(a[i] < '0' || a[i] > '9')) {
return NAN;
}
res = res + (a[i] - '0') * multers[i - j];
// res = res + (a[i] - '0') * mult;
// mult *= 0.1;
i++;
}
}
return res * sign;
}
int main()
{
string inputs[] = {
"31.0911863667",
"30.9500",
"225.1293333333",
"16.4850",
"29.0507297346",
"147.9440517474",
"28.8500",
"213.4600",
"212.9105553333",
"199.1553333333",
"19.5884123000",
"3092458.37500000000"
};
int n = sizeof(inputs) / sizeof(inputs[0]);
for (int i = 0; i < n; i++) {
float res1 = std::atof(inputs[i].c_str());
float res2 = ParseFloat<double>(inputs[i].c_str());
if (!abscmp(res1, res2)) {
cout << std::fixed << std::setprecision(20) << "CompareConvert " << res1 << " " << res2 << " " << std::string(inputs[i]) << std::endl;
} else {
cout << std::fixed << std::setprecision(20) << "Correct " << res1 << std::endl;
}
}
}
I'm writing a simple parser (with full validity check) because std::atof
is too slow (ParseFast
is 3.2x faster on average in my test inputs - parsing GBs of CSV file). The formula is very simple, res = res * 10 + (a[i] - '0');
. But it gives slightly different result.
I'm aware that this is because of limitations of IEEE-754 floating point. But is there any cheap way to make ParseFast
give exact same result as std::atof
? I need them to be exactly the same because it's interacting with a legacy module that uses sha256sum to check equality instead of fabs(a - b) < epsilon
Command to run: g++ -o main main.cpp -O3 -std=c++17
, gcc 10.2.0
Edit: Explanation why last input is wrong: the decimal part 0.375
is exactly between 2 possible values 0.5
and 0.25
. But with this parsing method, at the digit .3
, the intermediate result will be 3092458.0 + 0.3 == 3092458.25
because of rounding. Adding 0.075
to that will still give 3092458.25
.
As you've correctly noticed, the problem is that intermediate results are already imprecise, and this adds up. For example, storing .2
intermediately is already imprecise, even though it may be followed by 5
, which would make it .25
, and this would be exactly representable.
You need to accumulate the fractional part of the floating point number as an integer (still a float, but with no fractional part), and then divide once at the end to adjust the exponent:
// ...
if (LIKELY(a[i] != '\0')) {
T fraction = 0; // fractional part as an integer
T power = 1; // turns to 1, 10, 100, 1000, ... each loop iteration
i++;
while (a[i]) {
if (UNLIKELY(a[i] < '0' || a[i] > '9')) {
return NAN;
}
// note: additional logic is required to make sure that trailing
// zeros in the fraction cannot decrease the precision
power *= T{10};
fraction = fraction * T{10} + (a[i] - '0');
i++;
}
// note: 'power' can also be obtained from a look-up table like in
// your original code.
// Benchmark to make sure that it's actually faster to use a table.
res += fraction / power; // perform one division in the end, e.g. 375 / 100
}
return std::copysign(res, sign); // note: prefer copysign over multiplication
}
See live example on Compiler Explorer
Even with these changes, it is probably better to use third-party libraries, as @cpplearner has recommended. Standard library functions like std::strtof
or std::to_chars
may not provide the best performance, and the performance will vary form standard library to standard library.
While your solution may be faster on some platforms, it might perform much worse on a platform where floating point multiplication and division are more expensive. Floating point numbers are hard.