Search code examples
postgresqlsamplingprisma

How to sample a PostgreSQL database using Prisma?


Suppose I have several millions of statements in my PostgreSQL database and I want to get only 10000 of them. But not the first 10000, rather, a random selection of 10000 (it would be best if I could also choose the logic, e.g. select every 4th statement).

How could I do this using Prisma, or — if it's not possible using Prisma — using a good old PostgreSQL request?

For now, I'm using this code to limit the number of results I'm getting:

const statements = await this.prisma.statement.findMany({
      where: {
        OR: conditions,
      },
      orderBy: {
        createdAt: 'asc',
      },
      take: 10000,
    });

This will use the conditions I have, then order them in ascending order, and "take" or limit the first 10000 results.

What could I use in place of the "take" or what request I could make directly in PostgreSQL to randomly sample my DB for records?


Solution

  • Prisma doesn't natively support fetching random data as of now.

    There is a Feature Request that discusses the exact same scenario as you need.

    The alternative could be to use queryRaw for raw database access and use PostgreSQL's random function as described in the above mentioned Feature Request.