Search code examples
node.jsamazon-web-servicesaws-lambdaparquet

How to get parquet file schema in Node JS AWS Lambda?


Is there any way to read a parquet file schema from Node.JS? If yes, how?

I saw that there is a lib, parquetjs but as I saw it from the documentation it can only read and write the contents of the file.


Solution

  • After some investigation, I've found that the parquetjs-lite can do that. It does not read the whole file, just the footer and then it extracts the schema from it.

    It works with a cursor and the way I saw it there is two s3.getobject calls, one for the size and one for the given data.