I am attempting to copy data into snowflake on an AWS Lambda. I have a situation right now where I have a dataframe that has no duplicates in it. I verify this by checking my dataframe like so:
df.duplicated().any()
and verify that it returns False
I then double check by filtering by what should be a unique value in the dataframe
df[df["myColumn"] == "uniqueValue"]
and I get 1 result.
I then run the following:
write_pandas(
conn=con,
df=df,
table_name=table_name,
database=database,
schema=schema,
chunk_size=chunk_size,
quote_identifiers=False,
)
and then when the data lands in the Snowflake table and I query it, there are 5 of each row in the SF database.
I verified that this function only runs one time as well.
Why am I getting 5 duplicates?
EDIT OK so I realized it's not related to this package. The issue is that after 1 minute the lambda is triggered again, and then again 1 minute later, etc. until it's been triggered 5 times.
No idea why it's being triggered multiple times though because all of the executions succeed eventually, but there are 5 of them running before the first one actually completes
UPDATE
Verified that it's not a memory issue and not a timeout issue.
What I have noticed is that when an API Call is made to retrieve some external data is when the next lambda seems to be triggered. Not sure why that would play a role but it seems to be affecting it.
Also, it's not set at 5 times, it will just re-trigger every minute until the first lambda execution finishes. I can see that the logs stop when the API call starts, and it's at that same log mark that I see the next lambda execution start.
I'm not sure if this is a Jenkins specific issue or not, but what I found is that I was invoking the function synchronously and after 1 minute, if the lambda had not responded, then it was triggering it again... running with the invoke-async
cli option instead of invoke
lead to the duplication stopping.