I have got a huge number of events stored in S3 as small JSON files. now I need to ingest these files into Snowflake using Snowpipes. is there any performance concern around the number of requests being sent to Snowflake? Shall I merge these tiny files into a bigger JSON and then let the Snowflake ingest it?
I know Snowflake can automatically detect changes on S3 and try to refresh its external tables, but shall I let small files constantly trigger this process or not?
Snowflake Has some documentation that should answer this pretty well here. In short: Ideally your files are big, but not so big and/or complicated that they take more than a minute to process.
I have some snowpipes that handle a lot of small files without much trouble, but you'll probably get at least slightly better performance out of larger files. In general, SnowFlake is optimized for larger batches.
Another note is that Snowpipe pricing includes number of files loaded, so you might be looking at some cost savings by merging tiny files together as well.