I am using Python3's csv module and am wondering why I cannot control quoting correctly. I am using the option quoting = csv.QUOTE_NONNUMERIC
but am still seeing all entries quoted. Any idea as to why that is?
Here's my code. Essentially, I am reading in a csv file and want to remove all duplicate lines that have the same text string:
import sys
import csv
class Row:
def __init__(self, row):
self.text, self.a, self.b = row
self.elements = row
with open(sys.argv[2], 'w', newline='') as output:
writer = csv.writer(output, delimiter=';', quotechar='"',
quoting=csv.QUOTE_NONNUMERIC)
with open(sys.argv[1]) as input:
reader = csv.reader(input, delimiter=';')
header = next(reader)
Row.labels = header
assert Row.labels[1] == 'Label1'
writer.writerow(header)
texts = set()
for row in reader:
row_object = Row(row)
if row_object.text not in texts:
writer.writerow(row_object.elements)
texts.add(row_object.text)
When I look at the generated file, the content looks like this:
"Label1";"Label2";"Label3"
"AAA";"123";"456"
...
But I want this:
"Label1";"Label2";"Label3"
"AAA";123;456
...
OK ... I figured it out myself. The answer, I am afraid, was rather simple - and obvious in retrospect. Since the content of each line is obtained from a csv.reader()
its elements are strings by default. As a result, the get quoted by the subsequently employed csv.writer()
.
To be treated as an int
, they first need to be cast to an int
:
row_object.elements[1]= int(row_object.a)
This explanation can be proven by inserting a type check before and after this cast:
print('Type: {}'.format(type(row_object.elements[1])))