Search code examples
oracleoracle-autonomous-db

How can I compare files in two object store buckets using an Oracle Autonomous Database?


I have two buckets in different regions in OCI object store. I want to compare files in the two buckets using Oracle Autonomous Database, to detect missing files and copy them over and synchronize the two buckets. I use the two buckets in different regions as replicating some configuration metadata between the services running in the two regions.

Eg:

  • Bucket1 = 'https://objectstorage.us-ashburn-1.oraclecloud.com/n/mynamespace/b/mybucket/folder1/'
  • Bucket2 = 'https://objectstorage.us-phoenix-1.oraclecloud.com/n/mynamespace/b/mybucket/folder1/'

Solution

  • Oracle Autonomous Database provides the DBMS_CLOUD package for accessing object store files.

    DBMS_CLOUD package has a function LIST_OBJECTS, which provides a list of files in an object store location (bucket or subfolder). This function provides the object name , size in bytes and checksum, which can be used to compare files in the 2 buckets with the help of Oracle SQL FULL OUTER JOIN and WITH Clause.

    SQL Script:

    define bucket1_uri='https://objectstorage.us-phoenix-1.oraclecloud.com/n/mynamespace/b/mybucket/o/folder1/'
    define bucket2_uri='https://objectstorage.us-ashburn-1.oraclecloud.com/n/mynamespace/b/mybucket/o/folder2/'
    define credname1='OCI$RESOURCE_PRINCIPAL'
    define credname2='OCI$RESOURCE_PRINCIPAL'
    
    set linesize 120
    set pages 1000
    col object_name_A format a30
    col object_name_B format a30
    col object_checksum_A format a10
    col object_checksum_B format a10
    
    
    -- Use SQL FULL OUTER JOIN and compare the checksum exclude matching rows
    WITH bucket1 AS 
       (SELECT object_name, bytes, checksum FROM DBMS_CLOUD.LIST_OBJECTS('&credname1', '&bucket1_uri')
        ORDER BY object_name, bytes, checksum),
         bucket2 AS
       (SELECT object_name, bytes, checksum FROM DBMS_CLOUD.LIST_OBJECTS('&credname2', '&bucket2_uri')
        ORDER BY object_name, bytes, checksum)
    SELECT rownum id, diff.* FROM
    (SELECT a.object_name object_name_A, a.bytes object_size_A, a.checksum object_checksum_A,
            b.object_name object_name_B, b.bytes object_size_B, b.checksum object_checksum_B
      FROM bucket1 a FULL OUTER JOIN bucket2 b
        ON a.object_name = b.object_name) diff
    WHERE diff.object_name_A IS NULL OR
          diff.object_name_B IS NULL OR
          (diff.object_name_A = diff.object_name_B AND
           diff.object_checksum_A != diff.object_checksum_B)
    /
    

    Explanation:

    In the above script:

    1. The assumption is to provide 2 Object store URLs and a credential name. One can use Resource principal for OCI Object store in Autonomous Database to avoid specifying passwords.
    2. Using Oracle SQL WITH Clause, there are 2 inline views to provide the object_name, bytes and checksum columns from DBMS_CLOUD.LIST_OBJECTS in the two buckets.
    3. The script uses file checksum to compare files in the two buckets, which is provided by cloud object store providers as a hash of the file content.
    4. Using SQL Full Outer Join, we can get objects existing in bucket1 and not in bucket2, or vice-versa, as well as objects that have a same name in both buckets but different file checksum.
    5. Finally, the SQL above does not show objects that have a same name and same size in both buckets.

    Use Cases:

    The DBMS_CLOUD package works for multiple object stores such as Oracle OCI, AWS S3, Azure BLOB, Google Cloud, Wasabi, so the above script can be used to compare buckets in multiple scenarios such as -

    • Compare buckets in same region in OCI
    • Compare buckets in different regions in OCI
    • Compare buckets in different accounts in OCI object store
    • Compare buckets in different cloud providers - OCI bucket vs S3 bucket

    References: