Search code examples
web2py

Bulk insert or update in web2py


I know there are methods like bulk_insert and insert_or_update in web2py. The former inserts multiple records at one go while latter handles duplicate primary/unique key insert conditions(ON DUPLICATE KEY UPDATE). I want to do a bulk insert while also ensuring duplicate key condition. Is there anything like bulk_insert_or_update in web2py? If not, how could I achieve it?

I have written following query which I want to use something like bulk_insert_or_update

db.mcexlinker.insert_or_update(
db(db.mcex.example.like(‘%’+db.mc.element+’%’)).select(db.mc.id,db.mcex.id)
)

Update

My table definitions are as given below:

CREATE TABLE `mc` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `element` varchar(30) NOT NULL,
  `locale` int(11) NOT NULL,
  `synind` int(11) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `fk_mc_locale` (`locale`),
  KEY `fk_mc_synind` (`synind`),
  CONSTRAINT `fk_mc_synind` FOREIGN KEY (`synind`) REFERENCES `mc` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
  CONSTRAINT `fk_mc_locale` FOREIGN KEY (`locale`) REFERENCES `locale` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB AUTO_INCREMENT=84 DEFAULT CHARSET=latin1;

CREATE TABLE `mcex` (
  `id` int(11) NOT NULL AUTO_INCREMENT,
  `example` varchar(200) NOT NULL,
  `ranking` int(11) DEFAULT NULL,
  `enteredby` int(11) DEFAULT NULL,
  `verificationstatus` int(11) DEFAULT '0' COMMENT '0-unverified,1-verified',
  PRIMARY KEY (`id`),
  UNIQUE KEY `example_UNIQUE` (`example`)
) ENGINE=InnoDB AUTO_INCREMENT=6233 DEFAULT CHARSET=latin1;

CREATE TABLE `mcexlinker` (
  `id` int(11) NOT NULL,
  `mcid` int(11) NOT NULL,
  `exampleid` int(11) NOT NULL,
  `linkstatus` int(11) NOT NULL COMMENT '0-new,1-verified',
  PRIMARY KEY (`id`),
  KEY `fk_mclinker_element` (`mcid`),
  KEY `fk_mcexlinker_example` (`exampleid`),
  CONSTRAINT `fk_mcexlinker_element` FOREIGN KEY (`mcid`) REFERENCES `mc` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION,
  CONSTRAINT `fk_mcexlinker_example` FOREIGN KEY (`exampleid`) REFERENCES `mcex` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION
) ENGINE=InnoDB DEFAULT CHARSET=latin1;

Tables mc and mcex are linked through mcexlinker table to have many-to-many relationship. Table mc and mcex are going to be populated independently. At certain point a scheduled job will run and link new entries in these tables by checking examples(mcex table-example column) which look like elements(mc table-element column) and taking their ids and inserting them in mcexlinker table. Please let me know if you need more information.


Solution

  • Unless you are on Google App Engine, there is no advantage to using bulk_insert over simply looping and doing individual inserts. So, you might as well just use update_or_insert in a loop or list comprehension.

    Also, ‘%’+db.mc.element+’%’ will not work, as dc.mc.element is a Field object, not a string.

    UPDATE:

    Based on your updated question, you could use this solution, but if you don't want it to be database dependent, you could do some looping in Python as follows:

    for example in db(db.mcex).select(db.mcex.id, db.mcex.example):
        matches = db(db.mc.element.contains(example.example)).select(db.mc.id)
        for match in matches:
            db.mcexlinker.update_or_insert(mcid=match.id, exampleid=example.id)
    

    You can also do some testing to see if it is more efficient to do the matching in Python:

    elements = db(db.mc).select(db.mc.id, db.mc.element)
    for example in db(db.mcex).select(db.mcex.id, db.mcex.example):
        for element in elements:
            if example.example in element.element:
                db.mcexlinker.update_or_insert(mcid=element.id, exampleid=example.id)