I have a class which represents a character sequence and I’d like to implement an operator >>
for it. My implementation currently looks like this:
inline std::istream& operator >>(std::istream& in, seq& rhs) {
std::copy(
std::istream_iterator<char>(in),
std::istream_iterator<char>(),
std::back_inserter(rhs));
// `copy` doesn't know when to stop reading so it always also sets `fail`
// along with `eof`, even if reading succeeded. On the other hand, when
// reading actually failed, `eof` is not going to be set.
if (in.fail() and in.eof())
in.clear(std::ios_base::eofbit);
return in;
}
However, the following predictably fails:
std::istringstream istr("GATTACA FOO");
seq s;
assert((istr >> s) and s == "GATTACA");
In particular, once we reach the space in “GATTACA FOO
”, the copying stop (expected) and sets the failbit on the istream
(also expected). However, the read operation actually succeeded as far as seq
is concerned.
Can I model this at all using std::copy
? I also thought of using an istreambuf_iterator
instead but this doesn’t actually solve this particular problem.
What’s more, a read operation on the input “GATTACAFOO
” should fail since that input doesn’t represent a valid DNA sequence (which is what my class represents). On the other hand, reading an int
from the input 42foo
actually succeeds in C++ so maybe I should consider every valid prefix as a valid input?
(Incidentally, this would be fairly straightforward with an explicit loop but I’m trying to avoid explicit loops in favour of algorithms.)
You don't want to clear(eofbit)
because the failbit
should stay set if reading failed due to reaching EOF. Otherwise if you just leave eofbit
set without failbit
then a loop such as while (in >> s)
will attempt another read after reaching EOF, and then that read will set failbit
again. Except if it was using your operator>>
it would clear it, and try to read again. And again. And again. The right behaviour for a stream is to set failbit
if reading failed because of EOF, so just leave it set.
To do this with iterators and an algorithm you'd need something like
copy_while(InputIter, InputIter, OutputIter, Pred);
which would copy the input sequence only while the predicate was true, but that doesn't exist in the standard library. You could certainly write one though.
template<typename InputIter, typename OutputIter, typename Pred>
OutputIter
copy_while(InputIter begin, InputIter end, OutputIter result, Pred pred)
{
while (begin != end)
{
typename std::iterator_traits<InputIter>::value_type value = *begin;
if (!pred(value))
break;
*result = value;
result++;
begin++;
}
return result;
}
Now you could use that like this:
inline bool
is_valid_seq_char(char c)
{ return std::string("ACGT").find(c) != std::string::npos; }
inline std::istream&
operator>>(std::istream& in, seq& rhs)
{
copy_while(
std::istream_iterator<char>(in),
std::istream_iterator<char>(),
std::back_inserter(rhs),
&is_valid_seq_char);
return in;
}
int main()
{
std::istringstream istr("GATTACA FOO");
seq s;
assert((istr >> s) and s == "GATTACA");
}
This works, but the problem is that istream_iterator
uses operator>>
to read characters, so it skips over whitespace. This means the space following "GATTACA"
is consumed by the algorithm and discarded, so adding this to the end of main
would fail:
assert(istr.get() == ' ');
To solve this use istreambuf_iterator
which doesn't skip whitespace:
inline std::istream&
operator>>(std::istream& in, seq& rhs)
{
copy_while(
std::istreambuf_iterator<char>(in),
std::istreambuf_iterator<char>(),
std::back_inserter(rhs),
&is_valid_seq_char);
return in;
}
To complete this, you probably want to indicate failure to extract a seq
if no characters where extracted:
inline std::istream&
operator>>(std::istream& in, seq& rhs)
{
copy_while( std::istreambuf_iterator<char>(in), {},
std::back_inserter(rhs), &is_valid_seq_char);
if (seq.empty())
in.setstate(std::ios::failbit); // no seq in stream
return in;
}
That final version also uses one of my favourite C++11 tricks to simpify it slightly, by using {}
for the end iterator. The type of the second argument to copy_while
must be the same as the type of the first argument, which is deduced as std::istreambuf_iterator<char>
, so the {}
simply value-initializes another iterator of that same type.
Edit: If you want a closer match to std::string
extraction then you can do so too:
inline std::istream&
operator>>(std::istream& in, seq& rhs)
{
std::istream::sentry s(in);
if (s)
{
copy_while( std::istreambuf_iterator<char>(in), {},
std::back_inserter(rhs), &is_valid_seq_char);
int eof = std::char_traits<char>::eof();
if (std::char_traits<char>::eq_int_type(in.rdbuf()->sgetc(), eof))
in.setstate(std::ios::eofbit);
}
if (rhs.empty())
in.setstate(std::ios::failbit);
return in;
}
The sentry will skip leading whitespace and if you reach the end of the input it will set eofbit
. The other change that should probably be made is to empty the seq
before pushing anything into it, e.g. start with rhs.clear()
or equivalent for your seq
type.