I'm going to use pymysql
, tempfile
, and subprocess
to write a script which transports my MySQL database to MongoDB, but I don't make it right so far.
In the follow code:
for table_name in table_list:
cur.execute("select * from {0};".format(table_name))
with tempfile.TemporaryFile(mode='w+') as fp:
writer = csv.writer(fp)
writer.writerow([i[0] for i in cur.description])
for record in cur.fetchall():
writer.writerow(record)
fp.seek(0)
subprocess.Popen(['mongoimport', '-d', db_mongo, '-c', table_name, '--type', 'csv', '--file', *** I don't know what should be specified here ***, '--headerline'])
I tried to iterate table names of any mysql database, select all data from the table, write out the results into a temporary file, and then use mongoimport
to pour the data in the file into mongodb database. However, one, I don't know how to point out the file name within subprocess.Popen
function. Two, are there better ways to make what I want to do? I assume there are, but couldn't come up with the one which is better than this in my mind...
For your information, I originally used actual (not temporary) file name, but it failed if there are too large tables, since it attempts to create another file derived from another table before finishing the previous table's subprocess.Popen
operation (go into next iteration I mean), and it also forces me to delete the file once the whole process finished, so I prefer using temporary file to actual one, but not completely sure whether I'm even correct...
Thanks.
tempfile.TemporaryFile
doesn't have a name that you can get. In particular:
Under Unix, the directory entry for the file is removed immediately after the file is created.
On other platforms, there may or may not be a directory entry for the file, but you still have no way of getting it from the API.
This is exactly what tempfile.NamedTemporaryFile
is for:
This function operates exactly as
TemporaryFile()
does, except that the file is guaranteed to have a visible name in the file system (on Unix, the directory entry is not unlinked). That name can be retrieved from thename
attribute of the file object.
Note that even NamedTemporaryFile
will unlink the name as soon as you close it, unless you pass delete=False
. So, you have two choices:
delete=False
and rely on the OS to clean up the file at some point in the future. (On OS X, with the default per-user temp system, I believe the folder will get scanned and cleaned either at next logout or at next login, which man not be for a long time.)subprocess.check_call
or the like instead of just firing-and-forgetting a Popen
, but if you want to get sneakier, you can. For example, you can do the waiting-and-deleting in a background thread. Or, if this script is part of a larger suite of scripts, you can have it pass the temp path upward (via stdout, a file, whatever) and have the last script clean up all the files. Or you could just find a way to wait for the file to be opened instead of waiting for the whole process to be finished. Or patch mongoimport
to add a --unlink-input-file
flag. Or…