Search code examples
c++iteratorstdvectorstdmappartition

Making a class to access and appoint vector types and building a class which gets positions and assigns a region to it


I'm in the process of analysing a dataset which looks like the following

#Latitude   Longitude   Depth [m]   Bathy depth [m] CaCO3 [%] ...
-78 -177    0   693 1
-78 -173    0   573 2
.
.

The plan is to be able to read the data into a vector and then break it down into different groups of "Bathy depth" (Bathymetric depth). It also needs to be simultaneously grouped into different ocean basins. For instance, all of the data points within the North Atlantic, that is also between a BathyDepth of 500-1500m, 1000-2000m, 1500-2500m... should be in its own group (maybe a vector, or another object). The idea is to be able to output these into different text files.

I have attempted this in a somewhat stuttered manner. You can see this below

#include <iostream>
#include <sstream>
#include <fstream>
#include <vector>
#include <string>
#include <GeographicLib/Geodesic.hpp> //Library which allows for distance calculations

using namespace std;
using namespace GeographicLib;

//Define each basin spatially
//North Atlantic
double NAtlat1 = 1,  NAtlong1 = 1, NAtlat2 = 2, NAtlong2=2; //Incorrect               values, to be set later
//South Atlantic
double SAtlat1 = 1,  SAtlong1 = 1, SAtlat2 = 2, SAtlong2=2;
//North Pacific and the all others...

struct Point
{
        //structure Sample code/label--Lat--Long--SedimentDepth[m]--BathymetricDepth[m]--CaCO3[%]...
        string dummy;
        double latitude, longitude, rockDepth, bathyDepth, CaCO3, fCaCO3, bSilica, Quartz, CO3i, CO3c, DCO3;
        string dummy2;
        //Use Overload>> operator
        friend istream& operator>>(istream& inputFile, Point& p);
};

//Overload operator
istream& operator>>(istream& inputFile, Point& p)
{
        string row_text;
        //Read line from input, store in row_text
        getline(inputFile, row_text);
        //Extract line, store in ss row_stream
        istringstream row_stream(row_text);
        //Read-in data into each variable 
        row_stream >> p.dummy >> p.latitude >> p.longitude >> p.rockDepth >> p.bathyDepth >> p.CaCO3 >> p.fCaCO3 >> p.bSilica >> p.Quartz >> p.CO3i >> p.CO3c >> p.DCO3 >> p.dummy2;
        return inputFile;
}

int main ()
{
        //Geodesic class.
        const Geodesic& geod = Geodesic::WGS84();

        //Input file
    ifstream inputFile("Data.csv");
    //Point-type vector to store all data
    vector<Point> database;

    //bathDepth515 = depths between 500m and 1500m
    vector<Point> bathyDepth515;
    vector<Point> bathyDepth1020; //Create the rest too
    Point p;
    if (inputFile)
    {
            while(inputFile >> p)
            {
                    database.push_back(p);
            }

            inputFile.close();
    }
    else
    {
            cout <<"Unable to open file";
    }

    for(int i = 0; i < database.size(); ++i)
    {
            //Group data in database in sets of bathyDepth
            if(database[i].bathyDepth >= 500 && database[i].bathyDepth < 1500)
            {
                    //Find and fill particular bathDepths
                    bathyDepth515.push_back(database[i]);
            }
            if(database[i].bathyDepth >= 1000 && database[i].bathyDepth < 2000)
            {
                    bathyDepth1020.push_back(database[i]);
            }
            //...Further conditional statements based on Bathymetric depth, could easily include a spatial condition too...

    //Calculate distance between point i and all other points within 500-1500m depth window. Do the same with all other windows.
    for(int i = 0; i < bathyDepth515.size(); ++i)
    {
            for(int j = 0; j < bathyDepth515.size(); ++j)
            {
                    double s12;
                    if(i != j)
                    {
                            geod.Inverse(bathyDepth515[i].latitude, bathyDepth515[i].longitude, bathyDepth515[j].latitude, bathyDepth515[j].longitude, s12);
                    }
            }
    }
    return 0;
}

Problem 1: I think it is clear to see some of the methodology is not object oriented. For instance, there may be a better way to assign each data Point to a specific ocean basin, instead of manually putting these in at the beginning of my programme, just as I have done when grouping the data by depth. I started to create a basin class, with methods to retrieve the location, and definitions of each basin' lat/long, but did not get very far in finding an intuitive way of doing this. I'd like for someone to give me an idea of how better to build this. My attempt at building a (very fragile) class is as follows

class Basin
{
    public:
            Basin();
            Basin(double latit1, double longit1, double latit2, double longit2);
            double getLatitude();
            ...
    private:
            double NAt, SAt, NPac, SPac; //All basins
            double latitude1, longitude1, latitude2, longitude2; //         Boundaries defined by two latitude markers, and two longitude markers.
};

class NAt: public Basin{...}
//...Build class definitions...

Problem 2: My second concern is the method in which I have created vectors for the different depth windows. This could become very cumbersome if I have to change the way in which I have split the depths, or add more depths. I wouldn't like to have to change almost all parts of my programme just to accommodate how I decide to slide my depth windows. I'd appreciate someone to give me a few ideas on how to go about this. The only solution I could think of to Problem 2 is as follows

//Vector of vectors to store each rows' entries separately. Not sure if this is needed as my `Point` `Struct` already allows for each of the entries in a row to be accessed. 
    vector<vector<Point> > database;
    //Create database vectors for different bathDepth windows, eventhough their names will be the same, once push_back, it doesn't matter
    for(int i = 1*500; i<=8*500; i+=500)
    {
        vector<Point> bathDepthi;
        //Possibly push these back into database. These can then be accessed and populated using if statements.
    }
    //Create vectors individually, creating vector of vectors would be more elegant as I won't have to create more if one were to decide to change the range, I'd just have to change the for loop.

I don't know how much my attempts at this help, but I thought I might give a better insight into what I'm trying to get at. Apologies for such a long post.

EDIT-DESIGN SWITCH FROM std::vector TO std::map

Based on the input from a posted answer by heke, I have tried the following, but I am not sure if this is what the said user meant. I have chosen to use conditional statements to determine whether a point is within a particular basin. If my attempt is correct, I'm still unsure of how to proceed if this is correct. More specifically, I'm not sure how to store to access the partitioned vectors to, say, write them to separate text files (i.e. .txt files for a different bathymetric depths). Should I store the partition iterator into a vector, if so, of what type? The iterator is declared with auto, which confuses me as to how to declare the type of vector to hold this iterator.

My attempt:

std::map<std::string, std::vector<Point> > seamap;
seamap.insert( std::pair<std::string, std::vector<Point> > ("Nat", vector<Point>{}) );
seamap.insert( std::pair<std::string, std::vector<Point> > ("Sat", vector<Point>{}) ); //Repeat for all ocean basins

Point p;

while (inputFile >> p && !inputFile.eof() )
{
    //Check if Southern Ocean
    if (p.latitude > Slat2)
    {
        //Check if Atlantic, Pacific, Indian...
        if (p.longitude >= NAtlong1 && p.longitude < SAtlong2 && p.latitude > SPLIT)
        {
            seamap["Nat"].push_back(p);
        } // Repeat for different basins
    }
    else
    {
        seamap["South"].push_back(p);
    }
}

//Partition basins by depth
for ( std::map<std::string, std::vector<Point> >::iterator it2 = seamap.begin(); it2 != seamap.end(); it2++ )
{
    for ( int i = 500; i<=4500; i+=500 )
            {
                auto itp = std::partition( it2->second.begin(), it2->second.end(), [&i](const auto &a) {return a.bathyDepth < i;} );
            }
}

Solution

  • Problem 1

    Check this out: Determining if a set of points are inside or outside a square

    You can arrange the point by their basin while reading the data file. You just need a data structure where to but them. std::map might be a good candidate. (use the name of the ocean as the key and vector of points as the value)

    Problem 2

    Your depth window limits seem to grow with 500m steps, and the ranges seem to overlap. Maybe using std::partition for those vectors inside that map would work?

    some pseudo:

    1. partition the vector with <500 as the predicate, using the whole range.
    2. store the iterator returned by step 1 and use it as your lower while still using the end iterator as upper.
    3. now partition with <1000 as the predicate.
    4. same as 2-3. but with < 1500 as predicate and iterator returned by <1000 as the lower.
    5. repeat until all the depths are covered.
    6. do the same for all the sea-vectors stored in that map.

    The iterators that you stored can be used to get a certain range from that particular vector.

    i.e use the iterator returned by the <500 partition as a lower and iterator returned by <1500 partition and you get a range that has depths between 500 and 1500 in that particular vector.

    Now you should have a map full of vectors that you can access by name of the ocean and the range of depth, with intervals of 500m.

    edit: some clarification

    std::map<std::string, std::vector<point>> seamap;
    seamap.insert("Nat", std::vector<point>{}); // do this for every sea.
    //read point data from file
    
    while(FILESTREAM_NOT_EMPTY)
    {
        //construct a point object here.
    
        // add points to correct vector
    
        if (THIS_PARTICULAR_POINT_IS_IN_NAT)
               seamap["Nat"].push_back(point);
        else if (THIS_PARTICULAR_POINT_IS_IN_SAT)
               seamap["Sat"].push_back(point);
        //and so on...
    }
    

    after this do the partition thingy on every vector in that seamap. (helper : How to use range-based for() loop with std::map?)

    for (int i =500; i<4500;i+500)
    {
          auto iter_for_next = std::partition(LOWER_ITERATOR,
                                              UPPER_ITERATOR, 
                                              [&i]( const auto &a){return a < i;});
    
    //here save the iterator for later use in some kind of structure.
    }
    

    Hope this helps!