Search code examples
pythonpostgresqlazure-functionspgadminpg-dump

Copy remote postgres database to second remote server


I currently have a prod and test database that live on 2 servers azure postgres servers. I want to do a nightly backup of the prod database onto test, such that every morning the two are identical. My datatables have contraints and keys, so I can't just copy over the data itself but also the schemas, so a simple pandas df.to_sql won't cover it.

My current plan is to run a nightly Azure Functions python script that does the copying over. I tried sqlalchemy but had significant issues copying over metadata correctly. Now I am trying to use postgres' pg_dump and pg_restore/psql commands via a subprocess with the following code:

def backup_database(location, database, password, username, backup_file):
    # Use pg_dump command to create a backup of the specified database
    cmd = [
        'pg_dump',
        '-Fc',
        '-f', backup_file,
        '-h', location,
        '-d', database,
        '-U', username,
        '-p', '5432',
        '-W',
    ]
    subprocess.run(cmd, check=True, input=password.encode())

def clear_database(engine, metadata):
    # Drop all tables in the database
    metadata.drop_all(bind=engine, checkfirst=False)

def restore_database(location, database, password, username, backup_file):
    # Use pg_restore command to restore the backup onto the database
    # cmd = ['pg_restore', '-Fc', '-d', engine.url.database, backup_file]
    cmd = [
        'pg_restore',
        '-Fc',
        '-C',
        '-f', backup_file,
        '-h', location,
        #'-d', database,
        '-U', username,
        '-p', '5432',
        '-W',
    ]

    try:
        subprocess.run(cmd, check=True, capture_output=True, text=True)
        print("Backup restored onto the test server.")
    except subprocess.CalledProcessError as e:
        print("Error occurred while restoring the backup:")
        print(e.stdout)  # Print the output from the command
        print(e.stderr)  # Print the error message, if available


# Define backup file path
backup_file = '/pathtofile/backup_file.dump'  # Update with the desired backup file path
backup_file2 = 'backup_file.dump'  # Update with the desired backup file path

# Backup the production database
backup_database(input_host, input_database, input_password, input_user, backup_file)
print("Backup of the production database created.")

# Create metadata object for test server
output_metadata = MetaData(bind=output_engine)

clear_database(output_engine, output_metadata)
print("Test server cleared.")

restore_database(output_host, output_datebase, output_password, output_user, backup_file2)
print("Backup restored onto the test server.")

This code appears to be creating a dump file, but it is not successfully restoring to the test database. If I get this code to work, how do I specify file paths within Azure Functions, is this a suitable solution to run from Azure Functions? If not, how to get sqlalchemy to successfully clear test data/metadata, then copy over data from prod every night?


Solution

  • I have referred MSDOC Psycopg and PostgreSQL.

    import  psycopg2
    src_conn_string = "SourceConnectionString"
    dst_conn_string = "DStConnectionString"
    try:
    src_conn = psycopg2.connect(src_conn_string)
    src_cursor = src_conn.cursor()
    print("Connected to source database.")
    try:
    dst_conn = psycopg2.connect(dst_conn_string)
    dst_cursor = dst_conn.cursor()
    print("Connected to destination database.")
    try:
    src_cursor.execute(
    "SELECT table_name FROM information_schema.tables WHERE table_schema='public' AND table_type='BASE TABLE'"
    )
    tables = src_cursor.fetchall()
    for  table  in  tables:
    src_cursor.execute("SELECT * FROM {0}".format(table[0]))
    
    rows = src_cursor.fetchall()
    
    for  row  in  rows:
    dst_cursor.execute("INSERT INTO {0} VALUES {1}".format(table[0],  row))
    print("Data transferred successfully.")
    except  psycopg2.Error  as  e:
    print("Error transferring data: ",  e)
    finally:
    dst_conn.commit()
    dst_cursor.close()
    dst_conn.close()
    print("Destination database connection closed.")
    except  psycopg2.Error  as  e:
    print("Error connecting to destination database: ",  e)
    finally:
    src_cursor.close()
    src_conn.close()
    print("Source database connection closed.")
    except  psycopg2.Error  as  e:
    print("Error connecting to source database: ",  e)
    
    

    Output:

    enter image description here

    In Azure:

    Source:

    enter image description here

    Destination:

    enter image description here