Search code examples
google-cloud-platformgoogle-compute-engine

Google Cloud VM Files Deleted after session disconnect


I am having some of my GCP instances behave in a way similar to what is described in the below link: Google Cloud VM Files Deleted after Restart

The session gets disconnected after a small duration of inactivity at times. On reconnecting, the machine is as if it is freshly installed. (Not on restarts as in the above link). All the files are gone. As you can see in the attachment, it is creating the profile directory fresh when the session is reconnected. Also, none of the installations I have made are there. Everything is lost including the root installations. Fortunately, I have been logging all my commands and file set ups manually on my client. So, nothing is lost, but I would like to know what is happening and resolve this for good.

This has now happened a few times.

A point to note is that if I get a clean exit, like if I properly logout or exit from the ssh, I get the machine back as I have left, when I reconnect. The issue is there only when the session disconnects itself. There have been instances where the session disconnected and I was able to connect back as well.

The issue is not there on all my VMs.

From the suggestions from the link I have posted above:

  • I am not connected to the cloud shell. i am taking ssh of the machine using the chrome extension
  • Have not manually mounted any disks (afaik)
  • I have checked the logs from gcloud compute instances get-serial-port-output --zone us-east4-c INSTANCE_NAME. I could not really make much of it. Is there anything I should look for specifically?

Any help is appreciated.

As you can see in the attachment, it is creating the profile directory fresh when the session is reconnected

Please find the links to the logs as suggested by @W_B

Below is from 8th when the machine was restarted and files deleted

https://pastebin.com/NN5dvQMK

It happened again today. I didn't run the command immediately then. The below file is from afterwards though

https://pastebin.com/m5cgdLF6

The below one is after logout today.

[4]: https://pastebin.com/143NPatF

Please note that I have replaced the user id, system name and a lot of numeric values in general using regexp. So, there is a slight chance that the time and other values have changed. Not sure if that would be a problem.

I have added the screenshot of the current config from the UI screenshot of the current config from the UI


Solution

  • I realize this was posted long back, but adding the answer just in case it helps anyone.

    In my case, it turned out to be that the issue happened only with the 'f1 micro instances' I had. Eventually, I changed all my instances to be higher ones and never faced the issue afterwards.

    I am assuming that the 'f1 micro instances' were crashing without enough resources to even handle the basics and in the cloud, they ended up resetting the whole machine.

    I am not sure if these instances are still available as options today. So, this might all be moot at this point. Or there could be another low resource configuration for which the same happens.