One of our sites has around 10,000 nodes. In each node, there is a simple cck text/integer field. This integer changes daily, so they need to be updated every day. The integer ranges from 1 to 20000000. The cck field is across all content types, so it has its own table in the database. We don't use revisions. I chose to have it read a csv file because this table is a very simple with 3 fields. All integers. I didn't need all the flexibility of doing a php array type import.
I created a cron job to execute a php script everyday which holds something similar to:
LOAD DATA LOCAL INFILE 'file.csv'
REPLACE INTO TABLE content_field_mycckfield
FIELDS TERMINATED BY ','
LINES TERMINATED BY '\n'
(vid, nid, field_mycckfield_value);
At the end of the script, it counts how many records imported, reports success and errors. The file is below public, and all the jazz.
Are there any other steps I am missing? Anything I should be aware of or be cautious of?
Should I have it optimize or defragment this table after every run? Or every (x) of runs?
Should I have it first imported into a temp_ table to normalize the data, then have it copied/moved into TABLE content_field_mycckfield?
10,000 records is big but not massive in MySQL terms and the table is simple enough that I don't think you need any optimisation. If the data in the table is reliable and your .csv is always well formed then there's not a lot to go wrong.
The separate issue is whether your import process is throwing errors. If there is even the remotest chance that the .csv could contain incorrect column references, lost commas etc then your idea to test everything in a temp table is certainly a good one.
The only other things I can thing of are (in order of neuroticism)
Hope any of that helps!