Search code examples
pythonpandascsvreadfile

Python: How to exclude specific parts of a row when reading from CSV file


I'm very new to Python and am trying to read a CSV file:`

1980,Mark,Male,Student,L,90,56,78,44,88
1982,Cindy,Female,Student,S,45,76,22,42,90
1984,Kevin,Male,Student,L,67,83,52,55,59
1986,Michael,Male,Student,M,94,63,73,60,43
1988,Anna,Female,Student,S,66,50,59,57,33
1990,Jessica,Female,Student,S,72,34,29,69,27
1992,John,Male,Student,L,80,67,90,89,68
1994,Tom,Male,Student,M,23,60,89,78,39
1996,Nick,Male,Student,S,56,98,84,44,50
1998,Oscar,Male,Student,M,64,61,74,59,63
2000,Andy,Male,Student,M,11,50,93,69,90

I'd like to save only the specific attributes of this data into a dictionary, or a list of lists. For example, I'd only like to keep the year, name and the five numbers (in a row). I'm not sure how to exclude only the middle three columns.

This is the code I have now:

def read_data(filename):
    f = open("myfile.csv", "rt")
    import csv
    data = {}
    for line in f:
        row = line.rstrip().split(',')
        data[row[0]] = [e for e in row[5:]]

    return data

I only know how to keep chunks of columns together, but not only specific columns one by one.


Solution

  • You could do this with a simple list comprehension:

    def read_data(filename):
        f = open("myfile.csv", "rt")
        data = {}
        col_nums = [0, 1, 5, 6, 7, 8, 9]
        for line in f:
            row = line.rstrip().split(',')
            data[row[0]] = [row[i] for i in col_nums]
    
        return data
    

    You could also consider using Pandas to help you read and wrangle the data:

    import pandas as pd
    df = pd.read_csv("myfile.csv", columns=['year', 'name', 'gender', 'kind', 'size', 'num1', 'num2', 'num3', 'num4', 'num5'])
    data = df[['year', 'name', 'num1', 'num2', 'num3', 'num4', 'num5']]