Search code examples
pythoncsvassert

How to check if content of CSV file follows a specific format in Python?


I am creating a python program that take a CSV file as an input (location of file as command-line) and before doing any preprocessing, it asserts if the content of the file is in specific format and if not, raise an exception to the user stating choose correct files.

The content should be something like this:

Sr.no .  Codes .  v1 .     v2 .     v3 .     v4 .   ... v300
1 .      code1 .  val1 .   val2 .   val3 .   NA .   ... NA
2 .      code2 .  val4 .   NA .     NA .     NA .   ... NA
3 .      code3 .  val5 .   val6 .   NA .     NA .   ... NA
4 .      code4 .  val7 .   val8 .   val9 .   NA .   ... NA
.
.

Basically it should be a CSV file, with first two columns as SrNo. and Codes and next 300 columns as some values, followed by 'NA' up to 300.

If user uploads something like this

Sr.no .  Codes .  v1 .     v2 .      v3 .    . . . . . v300
1 .      code1 .  NA .     val1 .    NA .    . . . . . NA 
2 .      code2 .  val2 .   val3 .    NA .    . . . . . NA

It should raise an exception as in line with Srno=1, there is a value, in column v2 despite having NA in column v1.

Want to know, how I can assert if the content of file is in this format using Python(a sample code snippet would be helpful). Also sources from where I can learn how to assert file content content for not just this format but generic formats as well.

For now, I have reached up to here, and need to complete assert_format function

import sys
import csv

def assert_format(file_name):

    csv_file = open(file_name)
    reader = csv.reader(csv_file)

    #code to check format

    return True

file_name = sys.argv[1]

if assert_format(file_name):
    print("format is correct")
else:
    print("choose correct file")

Thanks in advance!


Solution

  • See if this fits your requirement:

    import sys
    import csv
    def assert_format(file_name):
        with open(file_name, 'rb') as csvfile:
            reader = csv.reader(csvfile, delimiter='.')
            for row in reader:
                flag=False
                for cell in row:
                    if(cell == 'NA' and not flag):
                            flag=True
                    elif(cell == 'NA' and flag):
                            return False
        return True
    
    file_name = sys.argv[1]
    
    if assert_format(file_name):
        print("format is correct")
    else:
        print("choose correct file")