Search code examples
csvrfcrfc4180

To validate CSV files according to RFC 4180, is the rule "The last field in the record must not be followed by a comma." wrong?


RFC 4180 states in page 2 that:

Within the header and each record, there may be one or more fields, separated by commas. Each line should contain the same number of fields throughout the file. Spaces are considered part of a field and should not be ignored. The last field in the record must not be followed by a comma.

So, per this standard, this would be invalid:

cat,dog,cow,

However, in theory it should represent a line of "cat", "dog", "cow" and "". So if adding a comma creates a new "last" element, the rule is actually never wrong. In fact, to respect "Each line should contain the same number of fields throughout the file." we'd need it in this case:

aaa,bbb,ccc,ddd
cat,dog,cow,

And indeed, some programs that export CSV do this for padding (ex.: Google Sheets).

Concluding, is the following the only right way to respect the standard?

aaa,bbb,ccc,ddd
cat,dog,cow,""

Or is the rule just wrong or redundant? Am I understanding this the wrong way?


Solution

  • The rule is not wrong at all, but it must be read very literarily: The last field must not be followed by a comma.

    If the last element is empty, it is the last-but-one element, that is followed by the comma, which is perfectly fine.

    So this is OK

    a,b,c,d
    x,y,z,
    u,v,,
    w,,,
    

    but this is wrong

    a,b,c,d
    x,y,z,
    d,e,f,g,
    

    EDIT from the discussion

    a,b,c,d,
    e,f,g,h,
    i,j,k,l,
    m,n,o,p,
    

    is also forbidden, according to the rule in question