python sorting amazon-redshift pandas-to-sql

Sorting and loading data from Pandas to Redshift using to_sql

I've built some tools that create front-end list boxes for users that reference dynamic Redshift tables. New items in the table, they appear automatically in the list.

I want to put the list in alphabetical order in the database so the dynamic list boxes will show the data in that order.

After downloading the list from an API, I attempt to sort the list alphabetically in a Pandas dataframe before uploading. This works perfectly:

df.sort_values(['name'], inplace=True, ascending=True, kind='heapsort')

But then when I try to upload to Redshift in that order, it loses the order while it uploads. The data appears in chunks of alphabetically ordered segments.

db_conn = create_engine('<redshift connection>')

obj.to_sql('table_name', db_conn, index = False, if_exists = 'replace')

Because of the way the third party tool (Alteryx) works, I need to have this data in alphabetical order in the database.

How can I modify to_sql to properly upload the data in order?

Solution

While ingesting data into redshift, data gets distributed between slices on each node in your redshift cluster.
My suggestion would be to create a sort key on a column which you need to be sorted. Once you have sort key on that column, you can run Vacuum command to get your data sorted.
Sorry! I cannot be of much help on Python/Pandas

If I’ve made a bad assumption please comment and I’ll refocus my answer.