I have multiple CSV files with same number of columns BUT different column orders in each , I wanted to merge them removing duplicates, all of the other solutions here dont consider column order hence merging output is incorrect, Hence how to do it in either windows commandline(e.g logparser) or bash?
Also python script to achieve this would also do.
The following script works properly if:
You only have to fill files
and final_headers
import csv
files = ['c1.csv', 'c2.csv', 'c3.csv']
final_headers = ['col1', 'col2', 'col3']
merged_rows = set()
for f in files:
with open(f, 'rb') as csv_in:
csvreader = csv.reader(csv_in, delimiter=',')
headers = dict((h, i) for i, h in enumerate(csvreader.next()))
for row in csvreader:
merged_rows.add(tuple(row[headers[x]] for x in final_headers))
with open('output.csv', 'wb') as csv_out:
csvwriter = csv.writer(csv_out, delimiter=',')
csvwriter.writerows(merged_rows)