Search code examples
pythonmysqlinsertmysql-pythonexecutemany

Inserting into MySQL using MySQLdb - python


https://drive.google.com/open?id=1aQkJYojDNMjNjJYlggxbkTq-KmzALDDb

I have this file (citations.dmp) and im tryin' to insert the data separed by | into a mysql database using the following code:

import MySQLdb

file = open('citations.dmp', 'r').readlines()
list = []
for x in file:
    a = str(x.replace('\t', ''))
    a = str(a).split('|')
    a.pop(len(a) - 1)
    list.append(a)

db = MySQLdb.connect(
    host='127.0.0.1',
    user='root',
    passwd='',
    db='tururu'

)

c = db.cursor()


print('Inserting...')

query = """ INSERT INTO `citations` (`cit_id`,`cit_key`,`pubmed_id`,`medline_id`,`url`,`text`,`taxid_list`)
            VALUES (%s,%s,%s,%s,%s,%s,%s)           
        """

c.executemany(query, list)
db.commit()

db.close()

the table has the following format:

CREATE TABLE `citations` (
                  `cit_id` VARCHAR(200) NULL,
                  `cit_key` VARCHAR(200) NULL,
                  `pubmed_id` VARCHAR(200) NULL,
                  `medline_id` VARCHAR(200) NULL,
                  `url` LONGTEXT NULL,
                  `text` LONGTEXT NULL,
                  `taxid_list` LONGTEXT NULL);

For some reason i recieve the following error everytime:

Traceback (most recent call last):
  File "C:/Users/lucas/PycharmProjects/bruno/tst.py", line 27, in <module>
    c.executemany(query, list)
  File "C:\ProgramData\Anaconda3\lib\site-packages\MySQLdb\cursors.py", line 281, in executemany
    self._get_db().encoding)
  File "C:\ProgramData\Anaconda3\lib\site-packages\MySQLdb\cursors.py", line 306, in _do_execute_many
    v = values % escape(arg, conn)

TypeError: not all arguments converted during string formatting

Can you help me? i'm in the last 3 days trying to fix it.


Solution

  • You have a problem on this line in your import file:

      5384    |       Associate Editor IJSEM (2001) (Trichlorobacter thiogenes)       |       0       |       0       |                     |       Associate Editor, IJSEM \"Validation List no. 78 (footnote ||).\" Int. J. Syst. Evol. Micr      obiol. (2001) 51:1-2. (Note: type strain information) |       115783  |
    

    it splits into 9 fields. Your format string for the insert query expects only 7. You need to add some validation in the file parsing to make sure the sub-lists that you push into the big import list (list variable in your code) always have exactly 7 elements, and deal with the exceptions accordingly - if the line has too few columns, either ignore it or populate with defaults, if it has too many, decide which ones are correct.

    I believe in this particular case the problem is caused by the incorrect assumption that the delimiter | can never appear in the string, as it does in footnote ||. You can solve it by adding some regex logic to filter out such abnormalities.