Search code examples
c++ifstream

Should C++ file read be slower than Ruby or C#?


Completely new to C++.

I'm comparing various aspects of C++, C# and Ruby to see if there's need for mirroring a library. Currently, simple read of a file (post update).

Compiling C++ and C# in VS 2017. C++ is in release(x64) mode (or at least compile then run)

The libraries more or less read a file and split the lines into three which make up the members of an object which are then stored in an array member.

For stress testing I tried a large file 380MB(7M lines) (after update) now getting similar performance with C++ and Ruby,

Purely reading the file and doing nothing else the performance is as below:

Ruby: 7s
C#:   2.5s
C++:  500+s (stopped running after awhile, something's clearly wrong)
C++(release build x64): 7.5s

The code:

#Ruby
file = File.open "test_file.txt"
while !file.eof 
    line = file.readline
end

//C#
StreamReader file = new StreamReader("test_file.txt");
file.Open();
while((line = file.ReadLine()) != null){

}



//C++
#include "stdafx.h"
#include "string"
#include "iostream"
#include "ctime"
#include "fstream"
int main()
{
    std::ios::sync_with_stdio(false);
    std::ifstream file;
    file.open("c:/sandboxCPP/test_file.txt");
    std::string line;

    std::clock_t start;
    double duration;
    start = std::clock();
    while (std::getline(file, line)) {

    }
    duration = (std::clock() - start) / (double)CLOCKS_PER_SEC;
    std::cout << "\nDuration: " << duration;
    while (true) 
    {

    }
    return 0;
}

Edit: The following performed incredibly well. 0.03s

vector<string> lines;
string tempString = str.str();
boost::split(lines, tempString, boost::is_any_of("\n"));
start = clock();
cout << "\nCount: " << lines.size();
int count = lines.size();
string s;
for (int i = 0; i < count; i++) {
    s =  lines[i];
} 

s = on the likelihood that I don't know what boost is doing. Changed performance.

Tested with a cout of a random record at the end of the loop.

Thanks


Solution

  • Based on the comments and the originally posted code (it has now been fixed [now deleted]) there was previously a coding error (i++ missing) that stopped the C++ program from outputting anything. This plus the while(true) loop in the complete code sample would present symptoms consistent with those stated in the question (i.e. user waits 500s sees no output and force terminates the program). This is because it would complete reading the file without outputting anything and enter into the deliberately added infinite loop.

    The revised complete source code correctly completes (according to the comments) in ~1.6s for a 1.2 million file. My advice for improving performance would be as follows:

    1. Make sure you are compiling in release mode (not debug mode). Given the user has specified they are using Visual Studio 2017, I would recommend viewing the official Microsoft documentation (https://msdn.microsoft.com/en-us/library/wx0123s5.aspx) for a thorough explanation.

    2. To make it easier to diagnose problems do not add an infinite loop at the end of your program. Instead run the executable from powershell / (cmd) and confirm that it terminates correctly.

    EDIT: I would also add:

    1. For accurate timings you also need to take into account the OS disk cache. Run each benchmark multiple times to 'warm-up' the disk cache.