Search code examples
google-cloud-platformdatasetgoogle-cloud-storagechatbotarchive

Using the Google Natural Questions (NQ) Dataset


I want to use the Natural Questions (NQ) by Google as the dataset for the chatbot I'm building. I have downloaded the data from Google Cloud with gsutil -m cp -R gs://natural_questions/v1.0 <path to your data directory>, but can't figure out how to use this data(I mean unarchive/load to DB/load to a .csv file). The data are present in the form of .gstmp archive file format.

The files are named as nq-train-00.jsonl.gz_.gstmp, nq-train-01.jsonl.gz_.gstmp ... and so on.

I can't seem to unarchive this file, can anybody help me out with this? Thank you!

This is the link to the dataset: https://ai.google.com/research/NaturalQuestions


Solution

  • The .gstmp files are temporary files which are generated when the download is still in progress or hasn't completed yet per Google Cloud Platform Github Repository Release 4.14