Search code examples
pandascsvpython-importexport-to-csvpython-3.8

Read CSV and create new CSV with only specified columns and subset or rows based on specific values


I have a CSV file with 5 columns:

Student Id, Course Id, Student Name, Course Name, GPA
12,112 , John,Math, 3.7
11,111 , Mohammed,Astronomy, 3.89
10,100 , Peter,Java Programming, 3.5
9,99 , Lisa,Cooking, 4.0

Using Python 3.8 I need to read that file and create a new CSV file with fewer columns eliminating all leading and trailing spaces:

(Student Name, Course Name and GPA),

but only where the Name field contains certain values (eg: 'John', 'Lisa')

And if there are nor records with corresponding Names, then DO NOT CREATE an output file and print a message (eg.: 'No Student name(s) found in the database')

I'm using the following code to create new CSV file with required columns, but I'm not sure how to select only subset of records for my new (output) CSV file.

import csv

with open('My_Source_file.csv') as infile, open('My_Ouput_file.csv', 'w', newline='') as outfile:
    csv.writer(outfile).writerows((row[3], row[4], row[5]) for row in csv.reader(infile))

Solution

  • Try this:

    import pandas as pd
    
    dta = pd.read_csv('test.csv')
    # removing the white space from column names:
    dta.columns = dta.columns.str.strip() 
    
    # removing the tailing white space from all records in all columns:
    
    for col in dta.columns:
        # checking if column is string:
        if dta[col].dtype == object:
            dta[col] = dta[col].str.strip()
    
    selection_list = ['John', 'Lisa']
    dta = dta[dta['Student Name'].isin(selection_list)]
        
    if len(dta) != 0:
        # pass your selected column as a list like this:
        selected_columns = ["Student Id","Course Id","Student Name"]
        dta[selected_columns].to_csv('My_Ouput_file.csv', index=False)
    
    else:
        print('No Student name(s) found in the database')
    

    output:

       Student Id Course Id   Student Name 
    0  12         112         John         
    3  9          99          Lisa         
    

    This script gets the names (or your variables of interest) in selection_list and selects rows where selected column (Student Name) has them.