Inserting lots of data in a mysql database. Steps being followed are : https://affiliate.itunes.apple.com/resources/documentation/epfimporter/
The problem is everything seems to be going well apart from I can't see the data. I tried select count(*)
on the table as well and it returns 0 rows.
I tried on command line for the same.
Size of DB is increasing.
I am using Sequel Pro and when I press refresh it shows the row count. And with time as my script keeps inserting more data, the row count and size is increasing as well. But after the loader completes row count shows 0 but size is still a big number.
The ingester code snippet. Full code can be found in link given above.
def _populateTable(self, tableName, resumeNum=0, isIncremental=False, skipKeyViolators=False):
"""
Populate tableName with data fetched by the parser, first advancing to resumePos.
For Full imports, if skipKeyViolators is True, any insertions which would violate the primary key constraint
will be skipped and won't log errors.
"""
#REPLACE is a MySQL extension which inserts if the key is new, or deletes and inserts if the key is a duplicate
commandString = ("REPLACE" if isIncremental else "INSERT")
ignoreString = ("IGNORE" if (skipKeyViolators and not isIncremental) else "")
exStrTemplate = """%s %s INTO %s %s VALUES %s"""
colNamesStr = "(%s)" % (", ".join(self.parser.columnNames))
self.parser.seekToRecord(resumeNum) #advance to resumeNum
conn = self.connect()
while (True):
#By default, we concatenate 200 inserts into a single INSERT statement.
#a large batch size per insert improves performance, until you start hitting max_packet_size issues.
#If you increase MySQL server's max_packet_size, you may get increased performance by increasing maxNum
records = self.parser.nextRecords(maxNum=300)
if (not records):
break
escapedRecords = self._escapeRecords(records) #This will sanitize the records
stringList = ["(%s)" % (", ".join(aRecord)) for aRecord in escapedRecords]
cur = conn.cursor()
colVals = unicode(", ".join(stringList), 'utf-8')
exStr = exStrTemplate % (commandString, ignoreString, tableName, colNamesStr, colVals)
#unquote NULLs
exStr = exStr.replace("'NULL'", "NULL")
exStr = exStr.replace("'null'", "NULL")
try:
cur.execute(exStr)
except MySQLdb.Warning, e:
LOGGER.warning(str(e))
except MySQLdb.IntegrityError, e:
#This is likely a primary key constraint violation; should only be hit if skipKeyViolators is False
LOGGER.error("Error %d: %s", e.args[0], e.args[1])
self.lastRecordIngested = self.parser.latestRecordNum
recCheck = self._checkProgress()
if recCheck:
LOGGER.info("...at record %i...", recCheck)
conn.close()
I even waited for script to complete but no good came out.
MySQl version 5.6.23
Any help is greatly appreciated.
It sounds like
autocommit=0
andSTART TRANSACTION
andCOMMIT
.Building multi-row INSERTs
is a good idea. 200 is good. But commit (or autocommit
) each one.
REPLACE
is not as good as INSERT .. ON DUPLICATE KEY UPDATE
; but the latter has a different syntax.
Using character set conversion routines (unicode()
) scares me. Often that leads to a mess. If everything is set correctly, no explicit conversions are needed.