I have a CSV file with 5 columns:
Student Id, Course Id, Student Name, Course Name, GPA
12,112 , John,Math, 3.7
11,111 , Mohammed,Astronomy, 3.89
10,100 , Peter,Java Programming, 3.5
9,99 , Lisa,Cooking, 4.0
Using Python 3.8 I need to read that file and create a new CSV file with fewer columns eliminating all leading and trailing spaces:
(Student Name, Course Name and GPA),
but only where the Name field contains certain values (eg: 'John', 'Lisa')
And if there are nor records with corresponding Names, then DO NOT CREATE an output file and print a message (eg.: 'No Student name(s) found in the database')
I'm using the following code to create new CSV file with required columns, but I'm not sure how to select only subset of records for my new (output) CSV file.
import csv
with open('My_Source_file.csv') as infile, open('My_Ouput_file.csv', 'w', newline='') as outfile:
csv.writer(outfile).writerows((row[3], row[4], row[5]) for row in csv.reader(infile))
Try this:
import pandas as pd
dta = pd.read_csv('test.csv')
# removing the white space from column names:
dta.columns = dta.columns.str.strip()
# removing the tailing white space from all records in all columns:
for col in dta.columns:
# checking if column is string:
if dta[col].dtype == object:
dta[col] = dta[col].str.strip()
selection_list = ['John', 'Lisa']
dta = dta[dta['Student Name'].isin(selection_list)]
if len(dta) != 0:
# pass your selected column as a list like this:
selected_columns = ["Student Id","Course Id","Student Name"]
dta[selected_columns].to_csv('My_Ouput_file.csv', index=False)
else:
print('No Student name(s) found in the database')
output:
Student Id Course Id Student Name
0 12 112 John
3 9 99 Lisa
This script gets the names (or your variables of interest) in selection_list
and selects rows where selected column (Student Name
) has them.