Search code examples
pythonlistcsvtabsspace

Why tabs written \t in CSV file using python


Let's say I have a list of list contains tabs character:

mylist = [['line 1', '<a href="//<% serverNames[0].getHostname() %>:'],
          ['line 2', '     <% master.getConfiguration()>']]

When I save the list into CSV file, the tab in the code at line 2 will be written \t.

line | code
-----------------------------------------------------
   1 | <a href="//<% serverNames[0].getHostname() %>:
   2 | \t   <% master.getConfiguration()>

I need this as it is because I want to compare the code with other lists. So, I don't want to replace the tab with other characters such as spaces.

The code I have written:

with open('codelist.csv', 'w') as file:
   header = ['line','code']
   writers = csv.writer(file)
   writers.writerow(header)
   for row in mylist:
      writers.writerow(row)

How to solve this kind of problem?


Solution

  • I can't reproduce the exact error in either Python2 or Python3 but I have a guess about what might be going on.

    According to the documentation for csv.writer, located here,

    All other non-string data are stringified with str() before being written.

    Note moreover that the python str function induces precisely the behavior you describe if you supply a string containing an actual tab character:

     >>> str('  ')
     '\t'
    

    Of course, what you have is string data, but, but the documentation above doesn't really say what other means. Here's what I found in the implementation of writerows in _csv.c, located here:

        if (PyUnicode_Check(field)) {
            append_ok = join_append(self, field, quoted);
            Py_DECREF(field);
        }
        else if (field == Py_None) {
            append_ok = join_append(self, NULL, quoted);
            Py_DECREF(field);
        }
        else {
            PyObject *str;
    
            str = PyObject_Str(field);
            Py_DECREF(field);
            if (str == NULL) {
                Py_DECREF(iter);
                return NULL;
            }
            append_ok = join_append(self, str, quoted);
            Py_DECREF(str);
        }
    

    So I suspect what's going on here is that somehow your list contains string data in a format that's not recognized as a unicode string, and which consequently fails the PyUnicode_Check branch in the test, gets sent through str (referred to as PyObject_Str in the C code), and consequently gets the escape sequence embedded.

    So you might want to check how that data is getting into your lists.

    Alternatively, maybe the source I'm looking at there doesn't correspond to the version of Python you're using, and you're using a version that, say, just runs everything through str.