I have two CSV files, each containing different columns that I want to merge into one database. I set up a database with all the columns from both files, but when I use load data infile to import both files instead of merging (IE data file 1 fills columns 1-6, data file 2 fills 7-10) I get a database with twice as many rows (one row for each record in each CSV) and NULLS filling in the data not represented in the source CSV.
I know I can fix this by merging the CSVs somehow, importing with overwrite enabled, or combining the data in the database - what's the most efficient way for me to do this?
The best way to do this is with a script. CSV import scripts are usually written in a scripting language such as python, ruby, or php.
You just need the importer for the second CSV to perform updates on the records created in the first CSV, so the script will really only be 5-10 lines. If you provide a sample record from each CSV, I'd be happy to write one for you.
Edit: Here's a python script to combine the files, adding a semicolon between lines from file1 and lines from file2. This essentially does what Linux's paste
command would do.
lines1 = open('file1.txt').readlines()
lines2 = open('file2.txt').readlines()
outfile = open('outfile.txt', 'w')
if len(lines1) != len(lines2):
raise Exception("Files need to be the same length, but file1 is %s lines long and file2 is %s lines long" % (len(lines1), len(lines2)));
for i in range(len(lines1)):
combined = lines1[i].strip() + ";" + lines2[i].strip() + "\n"
outfile.write(combined)
You can run it by saving it as combine.py
and typing python combine.py
. The folder you place it in should contain file1.txt
, file2.txt
, and outfile.txt
.