Search code examples
pythoncsv

How to ignore the first line of data when processing CSV data?


I am asking Python to print the minimum number from a column of CSV data, but the top row is the column number, and I don't want Python to take the top row into account. How can I make sure Python ignores the first line?

This is the code so far:

import csv

with open('all16.csv', 'rb') as inf:
    incsv = csv.reader(inf)
    column = 1                
    datatype = float          
    data = (datatype(column) for row in incsv)   
    least_value = min(data)

print least_value

Could you also explain what you are doing, not just give the code? I am very very new to Python and would like to make sure I understand everything.


Solution

  • You could use an instance of the csv module's Sniffer class to deduce the format of a CSV file and detect whether a header row is present along with the built-in next() function to skip over the first row only when necessary:

    import csv
    
    with open('all16.csv', 'r', newline='') as file:
        has_header = csv.Sniffer().has_header(file.read(1024))
        file.seek(0)  # Rewind.
        reader = csv.reader(file)
        if has_header:
            next(reader)  # Skip header row.
        column = 1
        datatype = float
        data = (datatype(row[column]) for row in reader)
        least_value = min(data)
    
    print(least_value)
    

    Since datatype and column are hardcoded in your example, it would be slightly faster to process the row like this:

        data = (float(row[1]) for row in reader)
    

    Note: the code above is for Python 3.x. For Python 2.x use the following line to open the file instead of what is shown:

    with open('all16.csv', 'rb') as file: