Search code examples
postgresqltransactionsprisma

Is using a big transaction for bulk uploads (of e.g. products) wrong?


I am currently writing the upload process for products in an online shop. During this process, many different and only loosely connected table entries need to be created (e.g. products, categories, tags). As I only want to write to the database when all entries succeed, I am using a transaction. However, I learned that transactions should be atomic, as in the entire opposite of what I am doing. For illustrative purposes, here is a snippet from my transaction. As you can see, I even had to up the maxWait and timeout in order for it to work.

await prisma.$transaction(
    async (tx) => {
        await createParentCategoriesForMassImport(tx, productData);
        await createCategoriesForMassImport(tx, productData);
        await createAttributesForMassImport(tx, productData);
        await createAttributeValuesForMassImport(tx, productData, data);
        await createProductsForMassImport(tx, productData, overwrite);
        await createTagsForMassImport(tx, productData);
        await connectCrossSellingProductsForMassImport(tx, productData);
        await createProductLinksForMassImport(tx, productData, data);
        await createProductAttributesForMassImport(tx, productData, data);
    },
    {
        maxWait: 5000, // default: 2000
        timeout: 50000, // default: 5000
    }
);

Most of them depend upon another. For example, the products need to be created before I can properly connect the tags and cross selling options.

Is this bad practice? And if so, what are my other options? I don't want to write individual queries, as when for example the tags fail to create I have a half broken database with unfinished product details.

I tried to look for advice online but I didn't find something that would fit my problem.


Solution

  • A database transaction is what you need if you want the guarantee that either all or none of your modifications are executed (atomicity). So yes, a single database transaction is what you want.

    However, each update produces "dead rows" in PostgreSQL, and these can only be cleaned up once your transaction is done. So a lot of updates can lead to "bloat" (a table that grows bigger than is warranted by the amount of data in it). If that effect is substantial, you might consider either splitting up the transaction into several parts (and VACUUMing the table in between) or reorganizing the table (VACUUM (FULL)) after the transaction is done.