input validation with std::ifstream

I'm reading a file using something like:

    std::ifstream is( file_name );
    std::string s;

    if( !is.good() ) {
        std::cout << "error opening file: " << file_name << std::endl;
    } else {
        while( !is.eof() ) {
            s.clear();
            is >> s;
            if (s.empty())  continue;
            if (s.size() < 1 || s.size()>0x7FFFFFFF ) {
               std::cout << "implausible data" << std::endl;
               continue;
            }
            char *ss = new char[ s.size() + 1 ]; // COVERITY bails out
            // do something with the data
            delete[]ss;
        }
    }

When I analyse the above code with the static code analysis tool coverity (free version), the line marked with COVERITY bails out throws an error:

 Untrusted value as argument (TAINTED_SCALAR)
   tainted_data: Passing tainted variable > s.size() + 1UL to a tainted sink.

I understand that I must not trust any data read from a file, but I fail to see how to validate the data at this stage. I'm already checking that s.size() is within a plausible (albeit rather large) range in the if-clause above the erroneous line.

So why is coverity throwing a warning at me?

Also, which other strategies for input-validation should I apply?

Solution

In the following section

if (s.empty())
  continue;
if (s.size() < 1 || s.size() > 0x7FFFFFFF)
  {
    std::cout << "implausible data" << std::endl;
    continue;
  }
char * ss = new char[s.size() + 1];

the validation logic relies on the important fact that s.size() will return the same value every time it is called. While in this case, we (humans) know this will be true, a static code analyzer might fail at realizing that.

As a workaround, try introducing a local variable and use that one.

const std::size_t length = s.size();
if (!length)
  continue;
if (length < 1 || length > 0x7FFFFFFF)
  {
    std::cout << "implausible data" << std::endl;
    continue;
  }
char * ss = new char[length + 1];

Here, it is simple for the analyzer to tell that length will not change its value.

Whether or not such coding around the limitations of static analyzer tools is worthwhile is open to debate. The GNU Coding Standards discourage it.

Don’t make the program ugly just to placate static analysis tools such as lint, clang, and GCC with extra warnings options such as -Wconversion and -Wundef. These tools can help find bugs and unclear code, but they can also generate so many false alarms that it hurts readability to silence them with unnecessary casts, wrappers, and other complications. For example, please don’t insert casts to void or calls to do-nothing functions merely to pacify a lint checker.

Personally, I don't feel too bad about it as long as the readability of the code doesn't suffer too much. In corner cases, adding a comment that explains why things are done the way they are done might be a good idea.