Search code examples
google-cloud-datalab

Files written to disk get deleted after a while


I am running a notebook on google cloud datalab. It generates some intermediate output files. The files show up when running the notebook.

However after several hours when I open the notebook again only the files in the datalab git repository (notebook files mostly) are there and everything else is deleted. The notebook kernel also seem to get restarted.

Is there any reason why? and how can I avoid this?


Solution

  • Google Cloud Datalab runs on App Engine Managed VM Environment. These use ephemeral disks that do not preserve your data between restarts.

    If intermediate output files need to be preserved for future use or compliance reasons, they should be persisted using Google Cloud Storage, or an other durable mechanism.

    Storage inside the VM instances is only suitable for temporary, disposable data.

    A related question explores the usage of Persistant Disks: Using persistent disks with google Datalab