Search code examples
amazon-web-servicesamazon-s3amazon-redshiftparquetspark-redshift

Redshift external catalog error when copying parquet from s3


I am trying to copy Google Analytics data into redshift via parquet format. When I limit the columns to a few select fields, I am able to copy the data. But on including few specific columns I get an error:

ERROR: External Catalog Error. Detail: ----------------------------------------------- error: External Catalog Error. code: 16000 context: Unsupported column type found for column: 6. Remove the column from the projection to continue. query: 18669834 location: s3_request_builder.cpp:2070 process: padbmaster [pid=23607] -----------------------------------------------

I know the issue is most probably with the data, but I am not sure how can I debug as this error is not helpful in anyway. I have tried changing data types of the columns to super, but without any success. I am not using redshift spectrum here.


Solution

  • I found the solution. In the error message it says Unsupported column type found for column: 6. Redshift column ordinality starts from 0. I was counting columns from 1, instead of 0 (my mistake). So this means issue was with column 6 (which I was reading as column 7), which was a string or varchar column in my case. I created a table with just this column and tried uploading data in just this column. Then I got

    redshift_connector.error.ProgrammingError: {'S': 'ERROR', 'C': 'XX000', 'M': 'Spectrum Scan Error', 'D': '\n  -----------------------------------------------\n  error:  Spectrum Scan Error\n  code:      15001\n  context:   The length of the data column display_name is longer than the length defined in the table. Table: 256, Data: 1020
    

    Recreating the column with varchar(max) for those columns solved the issue