I'm triggering a lambda to send data to Redshift through Firehose. When the lambda is triggered twice within a small period of time, say 1 minute, the data is collated. This creates an issue in loading the data to redshift and the issue is "Extra column(s) found".
eg: 1st set of data: 1,2,3,4, 2nd set of data: 5,6,7,8. Data received by Redshift: 1,2,3,45,6,7,8
After this happens, even if lambda is triggered once, no data is loaded into Redshift.
Why is this happening? How can I avoid this?
Thanks
This is likely due to omitting the end-of-record character from your data injecting code. End-of-record is unless changed and this indicates that this is all the data for the record. You need to have a in your data stream.
This isn't a problem when the data comes in further apart in time because firehose only waits a fixed amount of time before sending the data it has to Redshift. In this case end-of-file is reach and end-of-record is assumed.