Search code examples
oraclebigdataexadata

Oracle 12c - Table with size larger than 5 terabytes


In our database(Oracle 12c, Exadata) we plan to store sales data. Input text files containing sales data comes daily basis(~1000 files every day each containing ~20000 rows). Text files are read and transfered to db asap. Acccording to our calculations it will grow up to 5 terabytes in one year.

Data format: [transaction date][category][sales_number][buyer_id][other columns]

sales data comes in 10 different categories with same fields. Data logically can be stored in just one single table or can be divided into 10 tables (with respect to categories).

What is the best practice for storing such kind of big data in oracle? What kind of partitioning and indexing strategy should be applied?

Constraints: Data should be available for analysis in 2-3 days to marketing department. Queries based on [sales_number] or [category],[buyer_id] or [buyer_id] columns.


Solution

  • If the number of categories is known and fix then you can use a subpartition for each category. One approach could be this one:

    CREATE TABLE SALES_DATA
    (
      TRANSACTION_DATE TIMESTAMP(0) NOT NULL,
      CATEGORY NUMBER NOT NULL,
      SALES_NUMBER NUMBER, 
      BUYER_ID NUMBER,
      [OTHER COLUMNS]
    )
    PARTITION BY RANGE (TRANSACTION_DATE) INTERVAL (INTERVAL '1' DAY)
    SUBPARTITION BY LIST (CATEGORY)
    SUBPARTITION TEMPLATE 
        (
            SUBPARTITION CAT_1 VALUES (1), 
            SUBPARTITION CAT_2 VALUES (2),
            SUBPARTITION CAT_3_AND_4 VALUES (3,4), 
            SUBPARTITION CAT_5 VALUES (5),
            ...
            SUBPARTITION CAT_10 VALUES (10),
            SUBPARTITION CAT_OTHERS VALUES (DEFAULT)        
        )
        (
            PARTITION P_INITIAL VALUES LESS THAN (TIMESTAMP '2018-01-01 00:00:00')  
        );
    

    Local indexes would be needed on sales_number and buyer_id. You can put every (sub)partition into a separated tablespace if required.