I'm trying to find the indexes for certain header values in a CSV file so I can then use them to extract the data in those positions in the rest of the file. I'm adding the header values into a map<std::string, int>
so I can retain the indexes.
I had working code until I noticed that if a header is the last value in the row it doesn't match. The last header string is empty inside my nested loop but not in the outer loop.
const int columnCount = 2;
std::string columns[columnCount] = { "column1", "column2" };
map<std::string, int> columnMap;
std::vector<std::string> cols(columns, columns + columnCount);
std::vector<std::string> cells;
boost::tokenizer<boost::escaped_list_separator<char> > tok(header_row);
cells.assign(tok.begin(), tok.end());
std::vector<std::string>::iterator iter_cells;
std::vector<std::string>::iterator iter_cols;
for (iter_cells = cells.begin(); iter_cells != cells.end(); ++iter_cells) {
std::string cell = *iter_cells;
for(iter_cols = cols.begin(); iter_cols != cols.end(); ++iter_cols) {
std::string col = *iter_cols;
cout << cell << "=" << col;
if(col.compare(cell) == 0) {
cout << " MATCH" << endl;
columnMap.insert(std::make_pair(*iter_cols,iter_cells-cells.begin()));
break;
}
cout << endl;
}
}
Where the tok(row)
is the equivalent of tok("column0,column1,column2")
I get this output;
column0=column1
column0=column2
column1=column1 MATCH
=column1
=column2
Whereas if it's tok("column0,column1,column2,column3")
I get;
column0=column1
column0=column2
column1=column1 MATCH
column2=column1
column2=column2 MATCH
=column1
=column2
When I cout << cell
in the outer loop the value is shown correctly.
Why do I loose the value of cell
in the inner loop?
EDIT
Code in github and test files is compiled with;
gcc parse_csv.cpp -o parse_csv -lboost_filesystem -lmysqlpp
and executed with
./parse_csv /home/dave/SO_Q/
I get this output;
Process File: /home/dave/SO_Q/test_2.csv
metTime
metTime=metTime MATCH
Ta
=metTime
=Ta
=Ua
=Th
Process File: /home/dave/SO_Q/test_1.csv
DATE_TIME_UTC
DATE_TIME_UTC=metTime
DATE_TIME_UTC=Ta
DATE_TIME_UTC=Ua
DATE_TIME_UTC=Th
Ta
Ta=metTime
Ta=Ta MATCH
metTime
=metTime
=TaTime
=UaTime
=ThTime
The problem was with the input for the header line. This contained a line break at the end which wasn't matching the items in the array. Removing the line break fixed the problem.
I was working on a Windows PC and then transferring the file to a Cent OS machine for running the code and the difference in the line endings between the two platforms is what caused the issue.
Using this as a debug statement cout << cell
would show the string and ignore the line break. Using something like cout << cell << " CELL"
didn't show the string because of the line break.
I have now added this in my code to catch the difference in line breaks
// remove possible windows line ending so last item matches
if(header_row[header_row.length()-1] == '\r') {
header_row.erase(header_row.size()-1);
}