Search code examples
amazon-web-servicesaws-gluesnowflake-cloud-data-platform

Converting JSON to Parquet and Categorizing Objects Into Folder


I have 0 experience with Snowflake so please bear with me. Currently, we have a system where we collect gyroscope and accelerometer data in form of JSON from iWatch using AWS Kinesis stored in S3 bucket (lets call it bucket A), then we use AWS Glue to convert those JSON files into parquet files and divide the data based on its respective sensors and store the data in 2 different folders(accelerometer and gyroscope folders). Those transformed data are stored in a new bucket (lets call it bucket B). Now, is it possible to have Snowflake to do exactly what AWS Glue is doing also storing the converted and transformed data in Snowflake (removing bucket B)? Thanks


Solution

  • To build towards a complete answers:

    • Yes, Snowflake stores the data.
    • Yes, Snowflake transforms the data in a format similar to Parquet, however unlike Parquet, you may only access this with Snowflake.
    • Yes, Snowflake would replace bucket B.
    • Yes, Snowflake Tasks or Snowpipe could replace AWS Glue.

    Take a look at https://docs.snowflake.com/en/user-guide/data-load-s3.html