Search code examples
cloudwebsphere

LinuxONE commntiy cloud freezes when training a not so large CNN


Me and my team are participating in a hackathon and we are supposed to use the LinuxOne Community cloud server. We are training a CNN using transfer learning, which is a Resenet for one of our tasks.

The server keeps on getting disconnecting, is this because multiple users in our team are accessing the server, or what is it? What can we do?


Solution

  • Believe it or not, the instance from the LinuxOne Community cloud server has less RAM available You can open a new terminal and run free -h command to see how much RAM is available. When I used it was around 3.6GB.

    To mitigate the issue:

    1. You can use a lightweight ML model like MobileNet.
    2. Multiple users accessing the lab won't have a huge impact just avoid accessing one file all at once(all users).
    3. Don't perform huge read/write operations (like file copying and duplicating). This caused the main problem.