I have roughly 20 CSV files (all with headers) that I would like to merge all of them into 1 CSV file.
Looking online, one way I found was to use the terminal command:
cat *.csv > file.csv
This worked just fine, but the problem is, as all the CSV files come with headers, those also get placed into the CSV file.
Is there a terminal command or Python script on which I can merge all those CSV files into one and keep only one header?
My vote goes to the Awk solution, but since this question explicitly asks about Python, here is a solution for that.
import csv
import sys
writer = csv.writer(sys.stdout)
firstfile = True
for file in sys.argv[1:]:
with open(file, 'r') as rawfile:
reader = csv.reader(rawfile)
for idx, row in enumerate(reader):
# enumerate() is zero-based by default; 0 is first line
if idx == 0 and not firstfile:
continue
writer.writerow(row)
firstfile = False
Usage: python script.py first.csv second.csv etc.csv >final.csv
This simple script doesn't really benefit from any Python features, but if you need to count the number of fields in non-trivial CSV files (i.e. with quoted fields which might contain a comma which isn't a separator) that's hard in Awk, and trivial in Python (because the csv
library already knows exactly how to handle that).