Search code examples
pythonamazon-web-servicesamazon-redshiftalchemy

How to get a sample of random rows in redshift using SQL alchemy efficiently


Theres a data set of size 200M how to get random sample data(of size 100rows) efficiently using SQLalchemy or any other possible way.


Solution

  • SELECT * 
    FROM sales
    ORDER BY RANDOM()
    LIMIT 10;
    

    With random every row has an equal chance of being selected. Use Limit to choose how many rows to return.

    Reference: https://docs.aws.amazon.com/redshift/latest/dg/r_RANDOM.html