Search code examples
csvpython-3.xquoting

How to control quoting on non-numerical entries in a csv file?


I am using Python3's csv module and am wondering why I cannot control quoting correctly. I am using the option quoting = csv.QUOTE_NONNUMERIC but am still seeing all entries quoted. Any idea as to why that is?

Here's my code. Essentially, I am reading in a csv file and want to remove all duplicate lines that have the same text string:

    import sys
    import csv

    class Row:
        def __init__(self, row):
            self.text, self.a, self.b = row
            self.elements = row


    with open(sys.argv[2], 'w', newline='') as output:

        writer = csv.writer(output, delimiter=';', quotechar='"',
        quoting=csv.QUOTE_NONNUMERIC)

        with open(sys.argv[1]) as input:

            reader = csv.reader(input, delimiter=';')

            header = next(reader)

            Row.labels = header        
            assert Row.labels[1] == 'Label1'

            writer.writerow(header)
            texts = set()

            for row in reader:

                row_object = Row(row)

                if row_object.text not in texts:
                    writer.writerow(row_object.elements)
                    texts.add(row_object.text)

When I look at the generated file, the content looks like this:

    "Label1";"Label2";"Label3"
    "AAA";"123";"456"
    ...

But I want this:

    "Label1";"Label2";"Label3"
    "AAA";123;456
    ...

Solution

  • OK ... I figured it out myself. The answer, I am afraid, was rather simple - and obvious in retrospect. Since the content of each line is obtained from a csv.reader()its elements are strings by default. As a result, the get quoted by the subsequently employed csv.writer().

    To be treated as an int, they first need to be cast to an int:

        row_object.elements[1]= int(row_object.a)
    

    This explanation can be proven by inserting a type check before and after this cast:

        print('Type: {}'.format(type(row_object.elements[1])))