Search code examples
c++istream

Is there a flag to make istream treat only tabs as delimiters?


I want to make istream consider only tabs as whitespace. So, given "{json : 5}\tblah", I want to load json into obj1 and "blah" into obj2 with code like the following:

is << obj1 << obj2

Is there a way to do this without loading the objects into strings?


Solution

  • Yep in the local set the tab is the only character that has the space attribute.

    The hard part: Create a facet that inherits from ctype. Then make sure you set all characters to be not whitespace (except tab).

    #include <locale>
    #include <fstream>
    #include <iostream>
    #include <string>
    #include <sstream>    
    
    // This is my facet:
    // It is designed to treat only <tab> as whitespace
    class TabSepFacet: public std::ctype<char>
    {
        public:
            typedef std::ctype<char>   base;
            typedef base::char_type    char_type;
    
            TabSepFacet(std::locale const& l) : base(table)
            {
                // Get the ctype facet of the current locale
                std::ctype<char> const&  defaultCType = std::use_facet<std::ctype<char> >(l);
    
                // Copy the default flags for each character from the current facet
                static char data[256];
                for(int loop = 0; loop < 256; ++loop) {data[loop] = loop;}
                defaultCType.is(data, data+256, table);
    
                // Remove the other spaces
                for(int loop = 0; loop < 256; ++loop)
                {
                    // If the space flag is set then XOR it out.
                    if (table[loop] & base::space)
                    {   table[loop] ^= base::space;
                    }
                }
                // Only a tab is a space
                table['\t'] |= base::space;
            }
        private:
            base::mask table[256];
    };
    

    The easy part: create a locale object that uses the facet and imbue the stream with it:

    int main()
    {
        // Create a stream (Create the locale) then imbue the stream.
        std::stringstream data("This is a\tTab");
        const std::locale tabSepLocale(data.getloc(), new TabSepFacet(data.getloc()));
        data.imbue(tabSepLocale);
    
        // Note: If it is a file stream then imbue the stream BEFORE opening a file,
        // otherwise the imbue is silently ignored on some systems.
    
    
        // Now you can use the stream like normal; your locale defines what 
        // is whitespace, so the operator `>>` will split on tab.
        std::string   word;
        while(data >> word)
        {
            std::cout << "Word(" << word << ")\n";
        }
    }
    

    The result:

    > g++ tab.cpp
    > ./a.out
    Word(This is a)
    Word(Tab)
    

    Note: Not even newline is not a whitespace character above. So the operator >> will read across the end of line and ignore it.