I'm maintaining a large open source project, so I'm running into an odd fringe case on the I/O front.
When my app parses a user parameter file containing a line of text like the following:
CH3 CH2 CH2 CH2 −68.189775 2 180.0 ! TraPPE 1
...at first it looks innocent because it is formatted as desired. But then I see the minus is a UTF character (−
) rather than (-
).
I'm just using STL
's >>
with the ifstream
object.
When it attempts to convert to a negative and fails on the UTF character STL apparently just sets the internal flag to "bad", which was triggering my logic that stops the reading process. This is sort of good as without that logic I would have had an even harder time tracking it down.
But it's definitely not my desired error handling. I want to catch common minus like characters when reading a double
with >>
, replace them and complete the conversion if the string is otherwise a properly formatted negative number.
This appears to be happening to my users relatively frequently as they're copying and pasting from programs (calculator or Excel perhaps in Windows?) to get their file values.
I was somewhat surprised not to find this problem on Stack Overflow, as it seems pretty ubiquitous. I found some reference to this on this question:
c++ error cannot be used as a function, some stray error [closed]
...but that was a slightly different problem, in which the code contained that kind of similar, but noncompatible "minus-like" EN DASH UTF character.
Does anyone have a good solution (preferably compact, portable, and reusable) for catch such bad minuses when reading doubles or signed integers?
Note:
I don't want to use Boost or c++11 as believe it or not some of my users on certain supercomputers don't have access to those libraries. I'm try to keep it as portable as possible.
May be using a custom std::num_get is for you. Other character to value aspects can be overwritten as well.
#include <iostream>
#include <string>
#include <sstream>
class num_get : public std::num_get<wchar_t>
{
public:
iter_type do_get( iter_type begin, iter_type end, std::ios_base & str,
std::ios_base::iostate & error, float & value ) const
{
bool neg=false;
if(*begin==8722) {
begin++;
neg=true;
}
iter_type i = std::num_get<wchar_t>::do_get(begin, end, str, error, value);
if (!(error & std::ios_base::failbit))
{
if(neg)
value=-value;
}
return i;
}
};
int main(int argc,char ** argv) {
std::locale new_locale(std::cin.getloc(), new num_get);
// Parsing wchar_t streams makes live easier but in principle
// it should work with char (e.g. UTF8 as well)
static const std::wstring ws(L"CH3 CH2 CH2 CH2 −68.189775 2 180.0 ! TraPPE 1");
std::basic_stringstream<wchar_t> wss(ws);
std::wstring a;
std::wstring b;
std::wstring c;
float f=0;
// Imbue this new locale into wss
wss.imbue(new_locale);
for(int i=0;i<4;i++) {
std::wstring s;
wss >> s >> std::ws;
std::wcerr << s << std::endl;
}
wss >> f;
std::wcerr << f << std::endl;
}