Search code examples
postgresqlgreenplum

Terabyte scale database in Greenplum


I am currently using greenplum for little scale of data like 1GB to test it.

As greenplum is said to be "petabytes-scale", I was wondering if having a volume of data like one or ten terabytes is worth going into this MPP processing instead of a normal PostgreSQL database. All my network interfaces have 10 Mb/s for slaves and master.

Best practices don't include these considerations. The problem is that having maybe a "little database" will have poor result due to network processing. Did you already implement a database with this scale?


Solution

  • The workloads for PostgreSQL and Greenplum are different. PostgreSQL is great for OLTP, queries with index lookups, referential integrity, etc. You typically know the query patterns in an OLTP database too. It can certainly take on some data warehouse or analytical needs but it scales by buying a bigger machine with more RAM and more cores with faster disks.

    Greenplum, on the other hand, is designed for data warehousing and analytics. You design the database without knowing how the users will query the data. This means sequential reads, no indexes, full table scans, etc. It can do some OLTP work but it isn't designed for it. You scale Greenplum by adding more nodes to you cluster. This gives you more CPU, RAM, and disk throughput.

    What is your use case? That is the biggest determinant in picking Greenplum vs PostgreSQL.