I am reading one big csv file line by line and I want to count the no. of delimiters in each line.
But if the delimiter is part of data value, then it should not be counted.
Few records of data set:
com.abc.xyz, ple Sara, "DIT, Government of Maharashtra, India"
com.mtt.rder, News Maharashtra, Time Internet Limited"
com.grner.mahya, Mh Swth, "Public Health Department, Maharashtra"
In all 3 lines, number of actual commas (which divides the data into multiple columns) are only 2
but below code snippet outputs
Code Snippet:
file1 = open('file_name.csv', 'r')
while True:
line = file1.readline()
if not line:
break
print(line.count(','))
One simple way could be to use regex and remove everything between two "
, so that the commas inside aren't counted.
import re
file1 = open('input.csv', 'r')
while True:
line = file1.readline()
if not line:
break
line = re.sub('".*?"', '', line)
print(line.count(','))
Output:
2
2
2