A client needs to migrate a large volume of data and I feel this question could be generic enough for SO.
Legacy system
Student profiles contain fields like names, emails etc, as well as university name. The university name is represented as a string and as such is repeated which is wasteful and slow.
Our new form
A more efficient solution is to have a table called university
that only stores the university name once with a foreign key (university_id
) and the HTML dropdown just POSTs the university_id
to the server. This makes things much faster for doing GROUP BY
queries, for example. New form data going into the database works fine.
The problem
How can we write a query that will INSERT all the other columns (first_name
, last_name
, email
, ...) but then rather than inserting the university
string, find out its university_id
from the university
table and INSERT the corresponding int instead of the original string? (scenario: data is in a CSV file that we will manipulate into INSERT INTO syntax)
Many thanks.
Use INSERT INTO ... SELECT
with a LEFT JOIN
. Left is chosen so that student record won't get discarded if it has a null value for university_name.
INSERT INTO students_new(first_name, last_name, email, university_id)
SELECT s.first_name, s.last_name, s.email, u.university_id
FROM students_old s
LEFT JOIN university u ON s.university_name = u.university_name
Table and column names are to be replaced for real ones. Above assumes that your new table for students holding foreign key to university is students_new
while the old one (from before normalisation) is students_old
.