Search code examples
sql-servergreenplumbulk-loadalteryx

Alteryx - bulk copy from SQL Server to Greenplum - need tips to increase performance


Need advise here: using Alteryx Designer, I'm pulling a large dataset from SQL Server (10M rows) and need to move into Greenplum DB

I tried both with connecting using Input Data (SQL Server) and Output Data (GP) and also Connect In-DB (SQL Server) and Write Data In-DB (GP)

Any approach is taking a life to complete at the point that i have to cancel the process (to give an idea, over the weekend it ran for 18hours and advanced no further than 1%)

Any good advice or trick to speed up these sort of massive bulk data loading would be very very highly appreciated!

I can control or do modifications on SQL Server and Alteryx to increase performance but not in Greenplum

Thanks in advance.

Regards, Erick


Solution

  • I'll break down the approaches that you're taking.

    • You won't be able to use IN-DB tools as the Databases are different, hence you can't push the processing on to the DB...

    • Using the standard Alteryx Tools, you are bringing the whole table on to your machine and then pushing it out again, there are multiple ways that this could be done depending on where your blockage is.

    • Looking first at the extract from SQL, 10M rows isn't that much and so you could split the process and write it as a yxdb. If that fails or takes several hours, then you will need to look at the connection to the SQL Server or the resources available on the SQL Server.

    • Then for the push into Greenplum, there is no PostgreS bulk loader at present and so you can either just try and write the whole table, Or you can write segments of the table into temp tables in Greenplum and then execute a command to combine those tables.