Search code examples
pythonread-csv

Parsing a CSV file and read a specific column


I have a CSV file that is the output of a hydrological model. I need to read a specific column of this file but because of complex structure, pandas.read_csv cannot do that. I would be grateful if somebody can help me (For example, I need to read the Rain column values).

Please download the file from here


Solution

  • The file is not a standard csv format. The program that output's this might have a better way to export the file. But if not and this is your only option you can just parse the file manually like this:

    rain_values = []
    
    with open('AnnAGNPS_TBL_Gaging_Station_Data_Hyd.csv','r') as f:
        # skip first lines until the data begins
        while not next(f).startswith('Day'): pass
    
        # read lines of data
        for line in f:
            try:
                # append index 6 which is the data you want from the Rain column
                rain_values.append(line.split(',')[6].strip())
            except IndexError:
                # if we got this error it's because we've reached the end of the data section
                break
    
    print(rain_values)
    

    If you want to have a pandas dataframe with the csv data you can do something like this to load only the data you want:

    import pandas
    with open('AnnAGNPS_TBL_Gaging_Station_Data_Hyd.csv','r') as f:
        # skip first lines until the data begins
        while not next(f).startswith('Day'): pass
        lines = []
        for line in f:
            if line == '\n': break
            lines.append(line.split(','))
        pandas.DataFrame(lines)