Search code examples
stringqtpointersmemcpycharacter-arrays

atoi on a character array with lots of integers


I have a code in which the character array is populated by integers (converted to char arrays), and read by another function which reconverts it back to integers. I have used the following function to get the conversion to char array:

char data[64];
int a = 10;
std::string str = boost::lexical_cast<std::string>(a);
memcpy(data + 8*k,str.c_str(),sizeof(str.c_str()));   //k varies from 0 to 7

and the reconversion back to characters is done using:

char temp[8];
memcpy(temp,data+8*k,8);
int a = atoi(temp);

This works fine in general, but when I try to do it as part of a project involving qt (ver 4.7), it compiles fine and gives me segmentation faults when it tries to read using memcpy(). Note that the segmentation fault happens only while in the reading loop and not while writing data. I dont know why this happens, but I want to get it done by any method.

So, are there any other other functions which I can use which can take in the character array, the first bit and the last bit and convert it into the integer. Then I wouldnt have to use memcpy() at all. What I am trying to do is something like this:

new_atoi(data,8*k,8*(k+1)); // k varies from 0 to 7

Thanks in advance.


Solution

  • You are copying only a 4 characters (dependent on your system's pointer width). This will leave numbers of 4+ characters non-null terminated, leading to runaway strings in the input to atoi

     sizeof(str.c_str()) //i.e. sizeof(char*) = 4 (32 bit systems)
    

    should be

     str.length() + 1
    

    Or the characters will not be nullterminated

    STL Only:

    make_testdata(): see all the way down

    Why don't you use streams...?

    #include <sstream>
    #include <iostream>
    #include <algorithm>
    #include <iterator>
    #include <string>
    #include <vector>
    
    int main()
    {
        std::vector<int> data = make_testdata();
    
        std::ostringstream oss;
        std::copy(data.begin(), data.end(), std::ostream_iterator<int>(oss, "\t"));
    
        std::stringstream iss(oss.str());
    
        std::vector<int> clone;
        std::copy(std::istream_iterator<int>(iss), std::istream_iterator<int>(),
                  std::back_inserter(clone));
    
        //verify that clone now contains the original random data:
        //bool ok = std::equal(data.begin(), data.end(), clone.begin());
    
        return 0;
    }
    

    You could do it a lot faster in plain C with atoi/itoa and some tweaks, but I reckon you should be using binary transmission (see Boost Spirit Karma and protobuf for good libraries) if you need the speed.

    Boost Karma/Qi:

    #include <boost/spirit/include/qi.hpp>
    #include <boost/spirit/include/karma.hpp>
    
    namespace qi=::boost::spirit::qi;
    namespace karma=::boost::spirit::karma;
    
    static const char delimiter = '\0';
    
    int main()
    {
        std::vector<int> data = make_testdata();
    
        std::string astext;
    //  astext.reserve(3 * sizeof(data[0]) * data.size()); // heuristic pre-alloc
        std::back_insert_iterator<std::string> out(astext);
    
        {
            using namespace karma;
            generate(out, delimit(delimiter) [ *int_ ], data);
        //  generate_delimited(out, *int_, delimiter, data); // equivalent
        //  generate(out, int_ % delimiter, data); // somehow much slower!
        }
    
        std::string::const_iterator begin(astext.begin()), end(astext.end());
        std::vector<int> clone;
        qi::parse(begin, end, qi::int_ % delimiter, clone);
    
        //verify that clone now contains the original random data:
        //bool ok = std::equal(data.begin(), data.end(), clone.begin());
    
        return 0;
    }
    

    If you wanted to do architecture independent binary serialization instead, you'd use this tiny adaptation making things a zillion times faster (see benchmark below...):

    karma::generate(out, *karma::big_dword, data);
    // ...
    qi::parse(begin, end, *qi::big_dword, clone);
    

    Boost Serialization

    The best performance can be reached when using Boost Serialization in binary mode:

    #include <sstream>
    #include <boost/archive/binary_oarchive.hpp>
    #include <boost/archive/binary_iarchive.hpp>
    #include <boost/serialization/vector.hpp>
    
    int main()
    {
        std::vector<int> data = make_testdata();
    
        std::stringstream ss;
        {
            boost::archive::binary_oarchive oa(ss);
            oa << data;
        }
    
        std::vector<int> clone;
        {
            boost::archive::binary_iarchive ia(ss);
            ia >> clone;
        }
    
        //verify that clone now contains the original random data:
        //bool ok = std::equal(data.begin(), data.end(), clone.begin());
    
        return 0;
    }
    

    Testdata

    (common to all versions above)

    #include <boost/random.hpp>
    
    // generates a deterministic pseudo-random vector of 32Mio ints
    std::vector<int> make_testdata()
    {
        std::vector<int> testdata;
    
        testdata.resize(2 << 24);
        std::generate(testdata.begin(), testdata.end(), boost::mt19937(0));
    
        return testdata;
    }
    

    Benchmarks

    I benchmarked it by

    • using input data of 2<<24 (33554432) random integers
    • not displaying output (we don't want to measure the scrolling performance of our terminal)
    • the rough timings were
      • STL only version isn't too bad actually at 12.6s
      • Karma/Qi text version ran in 18s 5.1s, thanks to Arlen's hint at generate_delimited :)
      • Karma/Qi binary version (big_dword) in only 1.4s (roughly 12x 3-4x as fast)
      • Boost Serialization takes the cake with around 0.8s (or when subsituting text archives instead of binaries, around 13s)