I've built some tools that create front-end list boxes for users that reference dynamic Redshift tables. New items in the table, they appear automatically in the list.
I want to put the list in alphabetical order in the database so the dynamic list boxes will show the data in that order.
After downloading the list from an API, I attempt to sort the list alphabetically in a Pandas dataframe before uploading. This works perfectly:
df.sort_values(['name'], inplace=True, ascending=True, kind='heapsort')
But then when I try to upload to Redshift in that order, it loses the order while it uploads. The data appears in chunks of alphabetically ordered segments.
db_conn = create_engine('<redshift connection>')
obj.to_sql('table_name', db_conn, index = False, if_exists = 'replace')
Because of the way the third party tool (Alteryx) works, I need to have this data in alphabetical order in the database.
How can I modify to_sql
to properly upload the data in order?
While ingesting data into redshift, data gets distributed between slices on each node in your redshift cluster.
My suggestion would be to create a sort key on a column which you need to be sorted. Once you have sort key on that column, you can run Vacuum command to get your data sorted.
Sorry! I cannot be of much help on Python/Pandas
If I’ve made a bad assumption please comment and I’ll refocus my answer.