Search code examples
c++performanceoopifstream

Processing and reading a trajectory file efficiently using OOP


I am writing a code to read in large .xyz files. These type of files are useful for Molecular Dynamics visualizers like VMD. So the file format looks something like this

#Number of particles
#frame number
#Coordinates

As an example:

5
0
C    1.23    2.33    4.56
C    1.23    2.33    5.56
C    1.23    2.33    6.56
C    1.23    2.33    7.56
C    1.23    2.33    8.56
5
1
C    2.23    2.33    4.56
C    2.23    3.33    5.56
C    2.23    4.33    6.56
C    2.23    5.33    7.56
C    2.23    6.33    8.56

and so on. I was trying to understand this post here https://codereview.stackexchange.com/questions/201743/processing-xyz-data-from-a-large-file which talks about efficiently reading from large datasets using operator overloading method. I am trying to write a class which can read such large trajectory files and give me the following outputs : 1) number of particles 2) Total number of frames 3)set of Coordinates at each time step. So i have tried to write down the following based on this post to read in the the file format as mentioned above. So far the code below is able to read a single frame and exits after that.

#include <iostream>
#include <vector>
#include <fstream>

struct Particle{

    long double x,y,z;
    char tab ='\t';
    char newline = '\n';
    char atom ;
    friend std::istream& operator>>(std::istream& in, Particle &xyz) {
        in >> xyz.atom >> xyz.x >> xyz.y >> xyz.z ;
        return in;
    }
    friend std::ostream& operator<<(std::ostream& out, Particle &xyz){
        out << xyz.x << xyz.tab << xyz.y << xyz.tab << xyz.z << xyz.newline;
        return out;
    }
};
class XYZ_frame_read
{

    int curr_frame;
    int num_particles;
    std::vector<Particle> coordinates_t;

    public:

    friend std::istream& operator>>(std::istream& in, XYZ_frame_read &traj ){

                in >> traj.num_particles;
                in >> traj.curr_frame;
                Particle p;
                while(in >> p){
                    traj.coordinates_t.push_back(p);
                }
            return in;
        }
    friend std::ostream& operator<<(std::ostream& out, XYZ_frame_read &traj){

            for(int i = 0; i< traj.num_particles ;i ++){
                out << traj.coordinates_t.at(i) ;
            }
            return out;
        }
};

int main(int argc, char *argv[]){

    std::ifstream in(argv[1]);
    XYZ_frame_read* frames = new XYZ_frame_read[3];
    in >> frames[0];
    std::cout << frames[0];

    return 0;
}

The problem is I don't understand how will I will implement this method to read the next frames and keep appending them to the coordinates_t vector for each instance of the object XYZ_frame_read. I think I understand how this works so obviously a while(!in.eof()) is out of question, since it'll just read the first frame over and over again. I am a newbie to c++ and am working on a Molecular dyanamics related project, any changes/suggestions are welcome!! Thank you for the help!

EDIT

I have tried using

size_t i = 0;
while(in >> frames[i]){
    std::cout << frames[i];
    if(i == 3){
        break;
    }
    i++;
}

It returns blank. It doesn't work. The loop doesn't even get executed.


Solution

  • while(!in.eof()) is out of the question because eof doesn't work like that.

    Why is iostream::eof inside a loop condition (i.e. `while (!stream.eof())`) considered wrong?

    I'm not sure I see the problem, what's wrong with

    size_t i = 0;
    while (in >> frames[i])
        ++i;
    

    (apart from the possibility of array bounds errors).

    EDIT

    This code is incorrect

     friend std::istream& operator>>(std::istream& in, XYZ_frame_read &traj) {
         in >> traj.num_particles;
         in >> traj.curr_frame;
         Particle p;
         while(in >> p){
              traj.coordinates_t.push_back(p);
         }
         return in;
     }
    

    This says keep reading particles until a read fails. That's incorrect, you know how many particles there are. It should say keep reading particles until you have read num_particles of them (or a read fails). I.e. it should say

     friend std::istream& operator>>(std::istream& in, XYZ_frame_read &traj) {
         in >> traj.num_particles;
         in >> traj.curr_frame;
         Particle p;
         for (int i = 0; i < traj.num_particles && in >> p; ++i) 
              traj.coordinates_t.push_back(p);
         }
         return in;
     }