Search code examples
amazon-web-servicesamazon-ec2amazon-ebsec2-amisteamvr

On a windows server ec2 instance launched from an ami, steamvr runs into critical error when started after autologon


When I launch a new instance from an AMI (this AMI opens steamVR after logon), steamVR runs into an error. From what I have been reading it is because when an instance is launched based on an AMI, the EBS volume attached to the ec2 instance needs to get all the data from the snapshot that the AMI is based off of. i/o operations are slow for the first read and therefore when SteamVR is launched on login, it runs into a critical error. more details are here: https://forums.developer.nvidia.com/t/failed-watchdog-timeout-in-thread-server-main-in-cloudxrremotehmd-after-10-440015-seconds-aborting/180742/10

In this post, the final reply is the following:

"We did figure out a workaround last week. It turns out this issue is related specifically to AWS machines with EBS. When you start an EBS-backed machine in AWS, the EBS volume needs to be hydrated from the snapshot that it is based on. The snapshot data is backed by S3, and therefore is very slow for the first read of data from the volume. We are now using a little PowerShell script to basically cat {files} > /dev/null for all the files in the CloudXR and SteamVR directories before we start SteamVR. This hydrates those files into the volume so that they are fast, and we haven’t had issues crashing since then. More information on this is here: https://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/ebs-initialize.html#ebs-initialize-windows. Hope this helps."

How do I initialize the ebs volume for only the files in the steamvr and cloudxr directories? If I did use fio/dd to initialize the entire volume (200gb), how long would that take?

Any help would be greatly appreciated!


Solution

  • When an Amazon EBS volume is created from an AMI, AWS does not actually copy an blocks to the Amazon EBS volume.

    Instead, when a block on the EBS volume is first accessed, the EBS service checks to see whether the block is empty (unused) or whether it is a block that was stored in the AMI. If it was stored in the AMI, then the EBS service copies the block from the AMI snapshot to the EBS volume. This is why there is a slight delay.

    If you wish to avoid the delay, then it is necessary to access the block to force the "copy from snapshot to volume" operation to occur. One way to do this is to cat a file, which causes the underlying storage block(s) that hold that file to be copied to the volume.

    Therefore, to speed up access to the steamvr and cloudxr directories, you could simply initiate a command that reads the files in those directories.

    Based on Concatenating thousands of files: > vs >>, you could use something like:

    find /steamvr /cloudxr  -print0 | xargs -0 cat >/dev/null
    

    (Adjust directories names as required.)