What I want to do is that I have a big .csv file. I want to break down this big csv file into many small files based on the common records in BB column that also contain 1 in the HH column, and all uncommon records that contain 0 in HH column.
As a result, all files will contain common records in BB column that contain 1 in the HH column, and all uncommon records that has no records in BB column and contain 0 in the HH column. The file name should be based on the common record of column 2 (BB). Please take a look below for the scenario.
bigFile.csv :
AA BB CC DD EE FF GG HH
12 53 115 10 3 3 186 1
12 53 01e 23 3 2 1
12 53 0ce 65 1 3 1
12 53 173 73 4 2 1
12 59 115 0 3 3 186 1
12 59 125 0 3 3 186 1
12 61 01e 23 3 2 1
12 61 b6f 0 1 1 1
12 61 b1b 0 6 5 960 1
12 68b 95 3 5 334 0
12 31a 31 2 2 0
12 221 0 4 5 0
12 12b 25 5 4 215 0
12 a10 36 5 1 0
My expected results files woud be as follows:
53.csv :
AA BB CC DD EE FF GG HH
12 53 115 10 3 3 186 1
12 53 01e 23 3 2 1
12 53 0ce 65 1 3 1
12 53 173 73 4 2 1
12 68b 95 3 5 334 0
12 31a 31 2 2 0
12 221 0 4 5 0
12 12b 25 5 4 215 0
12 a10 36 5 1 0
59.csv :
AA BB CC DD EE FF GG HH
12 59 115 0 3 3 186 1
12 59 125 0 3 3 186 1
12 68b 95 3 5 334 0
12 31a 31 2 2 0
12 221 0 4 5 0
12 12b 25 5 4 215 0
12 a10 36 5 1 0
61.csv :
AA BB CC DD EE FF GG HH
12 61 01e 23 3 2 1
12 61 b6f 0 1 1 1
12 61 b1b 0 6 5 960 1
12 68b 95 3 5 334 0
12 31a 31 2 2 0
12 221 0 4 5 0
12 12b 25 5 4 215 0
12 a10 36 5 1 0
The following script will produce your expected outcome. It filters the file based on the contents of the first row:
import csv
with open('123.csv', 'r') as f_input, open('output.csv', 'w', newline='') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
header = next(csv_input)
first_row = next(csv_input)
csv_output.writerow(header)
csv_output.writerow(first_row)
for row in csv_input:
if row[0] == first_row[0]:
if row[1] == first_row[1] or row[7] == '0':
csv_output.writerow(row)
To do this operation on all CSV files in the same folder, the following could be used:
import glob
import csv
import os
file_list = glob.glob('*.csv')
output_folder = 'temp' # make sure this folder exists
for file in file_list:
file_name = os.path.join(output_folder, os.path.basename(file))
with open(file, 'r') as f_input, open(file_name, 'w', newline='') as f_output:
csv_input = csv.reader(f_input)
csv_output = csv.writer(f_output)
header = next(csv_input)
first_row = next(csv_input)
csv_output.writerow(header)
csv_output.writerow(first_row)
for row in csv_input:
if row[0] == first_row[0]:
if row[1] == first_row[1] or row[7] == '0':
csv_output.writerow(row)
This will write all of the output files to a temp
folder.