Search code examples
c++fileoptimizationfstreamofstream

Optimizing .txt files creation speed


I've written the following simple testing code, that creates 10 000 empty .txt files in a subdirectory.

#include <iostream>
#include <time.h>
#include <string>
#include <fstream>

void CreateFiles()
{
    int i = 1;
    while (i <= 10000) {
        int filename = i;
        std::string string_i = std::to_string(i);
        std::string file_dir = ".\\results\\"+string_i+".txt";
        std::ofstream outfile(file_dir);
        i++;
    }
}

int main()
{
    clock_t tStart1 = clock();
    CreateFiles();
    printf("\nHow long it took to make files: %.2fs\n", (double)(clock() - tStart1)/CLOCKS_PER_SEC);
    std::cin.get();
    return 0;
}

Everything works fine. All 10 000 .txt files are created within ~3.55 seconds. (using my PC)

Question 1: Ignoring the conversion from int to std::string etc., is there anything that I could optimize here for the program to create the files faster? I specifically mean the std::ofstream outfile usage - perhaps using something else would be relevantly faster?

Regardless, ~3,55 seconds is satisfying compared to the following:

I have modified the function so right now it would also fill the .txt files with some random i integer data and some constant text:

void CreateFiles()
{
    int i = 1;
    while (i <= 10000) {
        int filename = i;
        std::string string_i = std::to_string(i);
        std::string file_dir = ".\\results\\"+string_i+".txt";
        std::ofstream outfile(file_dir);

        // Here is the part where I am filling the .txt with some data
        outfile << i << " some " << i << " constant " << i << " text " << i << " . . . " 
        << i << " --more text-- " << i << " --even more-- " << i;
        i++;
    }
}

And now everything (creating the .txt files and filling it with short data) executes within... ~37 seconds. That's a huge difference. And that's only 10 000 files.

Question 2: Is there anything I can optimize here? Perhaps there exist some alternative that would fill the .txt files quicker. Or perhaps I have forgotten about something very obvious that slows down the entire process?

Or, perhaps I am exaggerating a little bit and ~37 seconds seems normal and optimized?

Thanks for sharing your insights!


Solution

  • The speed of creation of file is hardware dependent, faster the drive faster you can create the files.

    This is evident from the fact that I ran your code on an ARM processor (Snapdragon 636, on a Mobile phone using termux), now mobile phones have flash memory that are very fast when it comes to I/O. So it ran under 3 seconds most of the time and some time 5 second. This variation is expected as drive has to handle multi process read writes. You reported that it took 47 seconds for your hardware. Hence you can safely conclude that I/O speed is significantly dependent on Hardware.


    None the less I thought to do some optimization to your code and I used 2 different approaches.

    • Using a C counterpart for I/O

    • Using C++ but writing in a chunk in one go.

    I ran the simulation on my phone. I ran it 50 times and here are the results.

    • C was fastest taking 2.73928 second on average to write your word on 10000 text files, using fprintf

    • C++ writing with the complete line at one go took 2.7899 seconds. I used sprintf to get the complete line into a char[] then wrote using << operator on ofstream.

    • C++ Normal (Your Code) took 2.8752 seconds

    This behaviour is expected, writing in chunks is fasters. Read this answer as to why. C was fastest no doubt.

    You may note here that The difference is not that significant but if you are on a hardware with slow I/O, this becomes significant.


    Here is the code I used for simulation. You can test it yourself but make sure to replace std::system argument with your own commands (different for windows).

    #include <iostream>
    #include <time.h>
    #include <string>
    #include <fstream>
    #include <stdio.h>
    
    void CreateFiles()
    {
        int i = 1;
        while (i <= 10000) {
           // int filename = i;
            std::string string_i = std::to_string(i);
            std::string file_dir = "./results/"+string_i+".txt";
            std::ofstream outfile(file_dir);
    
            // Here is the part where I am filling the .txt with some data
            outfile << i << " some " << i << " constant " << i << " text " << i << " . . . " 
            << i << " --more text-- " << i << " --even more-- " << i;
            i++;
        }
    }
    
    void CreateFilesOneGo(){
        int i = 1;
        while(i<=10000){
            std::string string_i = std::to_string(i);
            std::string file_dir = "./results3/" + string_i + ".txt";
            char buffer[256];
            sprintf(buffer,"%d some %d constant %d text %d . . . %d --more text-- %d --even more-- %d",i,i,i,i,i,i,i);
            std::ofstream outfile(file_dir);
            outfile << buffer;
            i++;
        }
    }
            
    void CreateFilesFast(){
        int i = 1;
        while(i<=10000){
        // int filename = i;
        std::string string_i = std::to_string(i);
        std::string file_dir = "./results2/"+string_i+".txt";
        FILE *f = fopen(file_dir.c_str(), "w");
        fprintf(f,"%d some %d constant %d text %d . . . %d --more text-- %d --even more-- %d",i,i,i,i,i,i,i);
        fclose(f);
        i++;
        }
    }
    
    int main()
    {
        double normal = 0, one_go = 0, c = 0;
        for (int u=0;u<50;u++){
            std::system("mkdir results results2 results3");
            
            clock_t tStart1 = clock();
            CreateFiles();
            //printf("\nNormal : How long it took to make files: %.2fs\n", (double)(clock() - tStart1)/CLOCKS_PER_SEC);
            normal+=(double)(clock() - tStart1)/CLOCKS_PER_SEC;
           
            tStart1 = clock();
            CreateFilesFast();
            //printf("\nIn C : How long it took to make files: %.2fs\n", (double)(clock() - tStart1)/CLOCKS_PER_SEC);
            c+=(double)(clock() - tStart1)/CLOCKS_PER_SEC;
            
            tStart1 = clock();
            CreateFilesOneGo();
            //printf("\nOne Go : How long it took to make files: %.2fs\n", (double)(clock() - tStart1)/CLOCKS_PER_SEC);
            one_go+=(double)(clock() - tStart1)/CLOCKS_PER_SEC;
            
            std::system("rm -rf results results2 results3");
            std::cout<<"Completed "<<u+1<<"\n";
        }
        
        std::cout<<"C on average took : "<<c/50<<"\n";
        std::cout<<"Normal on average took : "<<normal/50<<"\n";
        std::cout<<"One Go C++ took : "<<one_go/50<<"\n";
        
        return 0;
    }
    

    Also I used clang-7.0 as the compiler.

    If you have any other approach let me know, I will test that too. If you find a mistake do let me know, I will correct it as soon as possible.