I want to use the Natural Questions (NQ) by Google as the dataset for the chatbot I'm building. I have downloaded the data from Google Cloud with gsutil -m cp -R gs://natural_questions/v1.0 <path to your data directory>
, but can't figure out how to use this data(I mean unarchive/load to DB/load to a .csv file). The data are present in the form of .gstmp
archive file format.
The files are named as nq-train-00.jsonl.gz_.gstmp, nq-train-01.jsonl.gz_.gstmp ...
and so on.
I can't seem to unarchive this file, can anybody help me out with this? Thank you!
This is the link to the dataset: https://ai.google.com/research/NaturalQuestions
The .gstmp files are temporary files which are generated when the download is still in progress or hasn't completed yet per Google Cloud Platform Github Repository Release 4.14