Search code examples
c++arrayscsvsortinganalysis

Sort CSV file by column and compare with another column in C++


I have some CSV files that I am able to import into C++, separate into columns and print them but I am unable to perform the analysis I need. I would like to be able to sort each column (ascending or descending), then find a grouping in a separate column of 1's or 0's. This is the code I have so far but it seems I'm replacing the variable every time a new row is created.

#include "stdafx.h"
#include <iostream>
#include <fstream>
#include <vector>

using namespace std;

struct sampleData { //Create set of variables that are used in the vector 
//to print sampleData
float first, second, third, fourth;
};

void printSample(sampleData sample) // Function that prints the vectors.

{
    cout << sample.first << " " << sample.second << " " << sample.third << " 
" << sample.fourth endl;
}

int main()
{
ifstream myFile("file.csv"); //Open file with ifstream constructor.

if (myFile.is_open())
{
    vector<sampleData> sample; //Create vector that stores the variables declared in the struct.
    float first, second, third, fourth; 
    char delim;

    while (myFile >> first >> delim >> second >> delim >> third >> delim >> fourth) { //Places each value of csv file into individual variables in the vector.
        sample.push_back({ first, second, third, fourth });
    }

    cout << "First value " << " Second value: " << " Third Value: " << " Fourth Value: " << endl; //Create column headers.
    for (int x(0); x < sample.size(); ++x)
    {
        printSample(sample.at(x));
    }

}


else
{
    cout << "The file did not open."; //Let's me know if file has not been opened.
}

system("pause");

return 0;
}

Below, is an example of what I need. I'd like to sort each column (1-3) and compare them with the fourth column of 1's and 0's to find groupings of at least seven 1's with an average of at least .70. Would it be best to create a 2d array or 2d vector and if so, how would it be sorted and compared?

Thanks for all the help.

> -40.31945 -20.71259   4.024558    1
> -8.428544 -1.173988   13.55221    1
> -9.99227  -1.964128   22.35553    1
> -6.227934 -0.6318588  11.28533    0
> -7.350101 -4.340335   9.932037    1
> -11.32407 -3.242851   15.07184    1
> -15.81499 -5.500328   15.33309    0
> -6.112404 -1.504377   24.17496    1
> -7.5483   -3.147136   17.5016     1
> -9.895069 -6.141642   17.70264    1
> -6.691729 -5.821645   41.11068    1
> -9.520897 -4.83869    12.83501    0
> -6.09901  -1.291806   22.62663    1
> -2.136172 -0.7562032  34.48225    1
> -5.813394 -2.087043   26.70455    0
> -2.359689 -0.04058313 68.30959    0
> -4.093154 -2.890539   32.40205    0
> -7.326787 -8.31641    23.47626    0
> -5.842336 -4.699064   32.14418    0
> -1.26901  -1.150853   54.72232    1
> -4.532993 -1.921023   27.54052    0
> -13.04364 -12.8271    17.78159    1
> -22.29973 -18.63197   10.62449    1
> -13.097   -11.09199   9.261793    0
> -6.73371  -4.044      24.63213    1
> -8.487038 -5.855842   20.65492    1
> -1.271804 -0.1592398  73.54436    0
> -5.903441 -2.511718   2.906148    0
> -6.569601 -3.63947    14.92872    0
> -2.671139 -1.596091   61.78936    1
> -0.67129  -0.1758051  35.63146    0
> -10.33999 -10.25158   19.83222    0
> -5.900752 -4.774312   22.25315    0
> -3.473342 -2.116564   60.31918    0
> -5.51118  -8.684725   45.30108    1
> -4.393883 -3.597137   21.0572     0
> -3.671957 -3.355143   51.05236    1
> -7.700621 -7.257176   29.59876    1
> -6.959113 -5.834087   21.52065    1
> -6.978306 -6.291922   26.17615    0
> -3.525233 -0.2435265  39.66356    0
> -8.017325 -7.190228   16.78984    1
> -9.686805 -6.356866   24.96812    1
> -5.841892 -4.090017   12.90826    1
> -4.101501 -0.8392091  29.49425    1
> -0.50966  -0.6248183  72.55316    0
> -2.747329 -3.107922   70.82893    1
> -3.682684 -5.461088   7.237332    0
> -1.726765 -1.030436   51.13756    0
> -5.065511 -5.105534   48.8038     1
> -3.490172 -0.8473139  54.89489    1
> -14.56848 -13.29985   8.508147    1
> -5.511615 -2.257046   26.53605    1
> -0.80373  -1.259443   54.58532    1
> -11.76727 -10.51294   19.43544    0
> -4.924498 -5.660692   64.22583    1
> -1.662102 -1.329681   68.50871    0
> -2.225776 -1.191363   46.14959    1
> -11.97834 -1.471152   18.86225    0
> -9.986734 -8.210676   15.11784    1
> -0.78368  -0.2543859  64.04224    1
> -11.41681 -13.24663   9.016961    1
> -10.73357 -13.46118   31.8038     1
> -2.443766 -0.841536   35.3982     1
> -3.112007 -1.327887   32.61596    1
> -1.647414 -0.9874625  65.37144    0
> -3.771582 -2.685039   42.65498    0
> -5.503803 -6.65314    15.60404    1
> -6.844056 -10.59976   22.71807    1
> -3.977231 -6.444871   47.65485    1
> -0.43918  -1.813655   35.90933    1
> -4.520459 -3.337119   17.47536    1
> -3.102405 -2.276846   15.49771    1
> -3.173711 -4.548148   54.85541    1
> -4.157713 -2.368944   36.82358    1
> -6.671762 -6.863191   33.18528    1
> -5.806525 -8.300102   38.04575    1
> -9.137906 -10.43044   20.62558    1
> -4.830114 -5.035967   80.04454    1
> -6.717423 -7.807728   18.62613    1
> -1.654782 -2.814744   69.35754    1
> -5.718936 -5.041555   19.44518    1
> -1.139612 -1.246455   31.46728    1
> -5.193422 -4.141603   49.06763    0
> -0.72360  -1.519114   68.06107    1
> -3.45456  -2.324488   24.8586     1
> -3.946017 -1.809939   26.39728    1
> -1.373865 -1.385224   59.31034    0
> -12.91463 -16.81217   21.9325     1
> -7.101114 -4.463167   24.6039     1
> -11.19178 -7.923832   11.70692    1
> -6.337176 -3.290151   46.2829     1
> -6.034304 -6.688771   12.98928    1
> -10.72616 -16.16286   27.24244    1
> -10.01076 -11.90333   16.67032    1
> -2.85405  -1.064295   18.82794    1
> -3.582814 -3.041154   34.58895    0
> -0.88143  -2.513154   72.57123    0
> -2.936312 -2.92483    32.65664    0
> -2.859565 -7.337652   31.87842    1
> -4.467122 -6.427214   56.81916    0
> -6.340817 -6.706052   9.87694     1
> -1.40155  -2.738037   35.32452    1
> -10.92032 -11.05833   30.82691    1
> -7.330603 -6.257256   22.16484    1
> -2.714168 -2.258151   36.30459    0
> -2.793682 -2.935043   56.51117    1
> -6.706202 -11.04426   11.10245    0
> -6.113976 -7.36745    11.36128    1
> -9.845764 -10.35044   37.52305    0
> -7.786937 -10.70406   21.68431    1
> -0.54450  -3.818708   64.34981    1
> -1.402748 -4.612042   52.94871    0
> -1.771809 -3.918717   41.45876    1
> -4.142132 -7.088901   45.44987    1
> -1.640578 -4.787658   40.82234    1
> -1.050637 -2.535334   42.87785    1
> -0.32151  -3.315413   40.40543    1

Solution

  • You need to provide comparitor functions that compare individual rows during the sorting. Hopefully you’ve got a modern version of C++ and can use lambdas:

    // Sort by first column
    std::sort( samples.begin(), samples.end(),
      []( const sampleData& a, const sampleData& b )
      {
        return a.first < b.first;
      }
    );
    

    Once you have sorted by a specific column you can iterate through the sequence and count consecutive fourth column 1s.

    edit
    The lambda is a way of creating a function object on-the-fly. The above is equivalent* to:

    struct unnamed_function
    {
      bool operator () ( const sampleData& a, const sampleData& b ) const
      {
        return a.first < b.first;
      }
    };
    
    ...
    
    std::sort( samples.begin(), samples.end(), unnamed_function() );
    

    The [] is the “capture list” that introduces the lambda.

    Read more about lambdas. (Sorry I don’t have a better FAQ up for the world yet...)

    * Roughly. It is actually a little more complex than this behind the scenes.