Search code examples
pythondelimiterreadline

Count no. of delimiters in line while ignoring the delimiter which is part of data value


I am reading one big csv file line by line and I want to count the no. of delimiters in each line.

But if the delimiter is part of data value, then it should not be counted.

Few records of data set:

com.abc.xyz, ple Sara, "DIT, Government of Maharashtra, India"
com.mtt.rder, News Maharashtra, Time Internet Limited"
com.grner.mahya, Mh Swth, "Public Health Department, Maharashtra"

In all 3 lines, number of actual commas (which divides the data into multiple columns) are only 2

but below code snippet outputs

  • 4 commas for line 1
  • 2 for line 2
  • 3 for line 3

Code Snippet:

file1 = open('file_name.csv', 'r') 

while True: 

    line = file1.readline() 
  
    if not line: 
        break
    
    print(line.count(','))

Solution

  • One simple way could be to use regex and remove everything between two ", so that the commas inside aren't counted.

    import re
    file1 = open('input.csv', 'r') 
    
    while True: 
        line = file1.readline()   
        if not line: 
            break
        line = re.sub('".*?"', '', line)
        print(line.count(','))
    

    Output:

    2
    2
    2