Search code examples
pythoncsv

Merge multiple CSV files into one without duplicate headers


I have roughly 20 CSV files (all with headers) that I would like to merge all of them into 1 CSV file.

Looking online, one way I found was to use the terminal command:

cat *.csv > file.csv

This worked just fine, but the problem is, as all the CSV files come with headers, those also get placed into the CSV file.

Is there a terminal command or Python script on which I can merge all those CSV files into one and keep only one header?


Solution

  • My vote goes to the Awk solution, but since this question explicitly asks about Python, here is a solution for that.

    import csv
    import sys
    
    
    writer = csv.writer(sys.stdout)
    
    firstfile = True
    for file in sys.argv[1:]:
        with open(file, 'r') as rawfile:
            reader = csv.reader(rawfile)
            for idx, row in enumerate(reader):
                # enumerate() is zero-based by default; 0 is first line
                if idx == 0 and not firstfile:
                    continue
                writer.writerow(row)
        firstfile = False
    

    Usage: python script.py first.csv second.csv etc.csv >final.csv

    This simple script doesn't really benefit from any Python features, but if you need to count the number of fields in non-trivial CSV files (i.e. with quoted fields which might contain a comma which isn't a separator) that's hard in Awk, and trivial in Python (because the csv library already knows exactly how to handle that).