Search code examples
c++csvdouble-quotes

C++ Read double quotation marks from a file


So I was trying to read a csv file using c++ and do some calculation and output to another csv file. Everything works fine but when the program reads a line :

<a href="http://www.google.com" target="_blank">google</a>

and I want to see what the program has read so I cout that string, and it shows:

<a href=""http://www.google.com"" target=""_blank"">google</a>

Basically it doubles every double quotation marks? How can I solve this?

Edits:

Here's my code:

int main() 
{
    ifstream read;
    ofstream write;
    string line;
    string cell;
    int col = 0;
    string temp;
    string links;
    read.open("Book1.csv");
    write.open("output.csv");
    if (read.is_open())
    {
        cout << "opened" <<endl ;
        getline(read, line);
        while(getline(read,temp))
        {
            stringstream line(temp);
            while (getline(line, cell, ','))
            {
                if (col > 9)
                {
                    links.pop_back();
                    write << links<<endl;
                    col = 0;
                    links = "";
                    break;
                }
                else
                {
                    if (cell != "")
                    {
                        if (col == 0)
                        {
                            write << cell<<',';
                        }
                        else if (col == 1)
                        {
                            write << cell<<',';
                        }
                            else
                    {
                            cell.erase(0, 1);
                            cell.pop_back();
                            links += cell;

                            links += '/';
                        }
                        cout << cell << endl;
                    }
                    col += 1;
                }
            }
        }       
    }
    else 
    {
        cout << "failed" << endl;
    }       
    read.close();
    write.close();  
}

Solution

  • This is perfectly normal. The quotes inside the field (inside your csv file) are escaped with another quote to generate valid csv.

    Consider this csv data:

    123,"monitor 27"", Samsung",456
    

    Since the second field contains a , it needs to be quoted. But because there are quotes inside the field, those need to be escaped with another quote.

    So, it is not the reading that add's the extra quotes, they are already inside your csv (but a csv viewer will only show one quote after parsing).

    If you are outputting this string to another csv you can (need to) leave the double quotes, just make sure the whole field is surrounded by quotes too.


    Update (after posting the code):

    First, I'll assume that the second string you posted was also surrounded with quotes like this:

    "<a href=""http://www.google.com"" target=""_blank"">google</a>"
    

    Otherwise you would have invalid csv data.

    To parse csv, we cannot just split on each , because there could be one inside a field.

    Let's say we have the following fields:

    123
    monitor 27", Samsung
    456
    

    To write those to a valid csv row, the second field has to be surrounded with quotes because there is a comma inside. If there are quotes inside a quoted field, those need to be escaped with another quote. So we get this:

    123,"monitor 27"", Samsung",456
    

    Without the second quote after 27" the csv would be invalid and unparsable.

    To correctly scan a csv row, you need to check every byte. Here's some pseudo code which will also make clear why there have to be 2 quotes (assuming there are no multiline fields):

    read a line
    
    bool bInsideQuotes = false
    
    loop over chars
      if character == '"'
        bInsideQuotes = !bInsideQuotes
      if character == ',' and !bInsideQuotes
        found a field separator
    

    That way you skip the , inside a field. Now it's also easy to see why quotes inside a field need to be escaped with an extra quote: bInsideQuotes becomes false at 27", and the second quote (27"") forces bInsideQuotes to become true again (we're still inside a field).

    Now, to write back that original string you don't have to change a thing. Just write it to the second file as you read it from the original file, and your csv will remain valid.

    To use the string, remove the 2 outer quotes and replace every 2 quotes with 1 quote.