Search code examples
mlopsvoxel51fiftyoneimage-annotations

Diagnose crashes of FiftyOne app – logs or other tools


We need to make a FiftyOne instance available to multiple users via a web browser. We need to start a process and have it run, even after we log off from the session that initiated the app processes.

I’m using the following command to start the process. I’m executing this in a Docker container. The container is running on an Ubuntu host via AWS EC2.

$ nohup fiftyone app launch --remote > fiftyone.log 2>&1 &

If I launch this command from the terminal, it launches processes which allow a web browser to connect with the FiftyOne app. These persists after I log out.

However, these processes sometimes become unavailable. For example, after running for over 20 hours, FiftyOne crashed with the following in the log file ~/.fiftyone/var/lib/mongo/log/mongo.log.

(produced by cat ~/.fiftyone/var/lib/mongo/log/mongo.log | jq '{msg,t}')

{
  "msg": "CMD fsync",
  "t": {
    "$date": "2021-09-01T15:04:24.152+00:00"
  }
}
{
  "msg": "Received signal",
  "t": {
    "$date": "2021-09-01T15:04:24.181+00:00"
  }
}
{
  "msg": "Signal was sent by kill(2)",
  "t": {
    "$date": "2021-09-01T15:04:24.181+00:00"
  }

How might I get more information about why this crashed?


Solution

  • The open-source version of FiftyOne is designed primarily for individual users. The best experience for multi-user collaboration is FiftyOne Teams. You can sign up here: https://voxel51.com/#teams-form

    About this error specifically:

    On the backend, calling fiftyone app launch --remote in effect runs the following Python commands:

    session = fo.launch_app(remote=True)
    session.wait()
    

    For remote sessions, the session.wait() call will block until something connects to it, and then will continue blocking until all connected tabs are closed.

    There is a timeout built in to handle the case when the tab is refreshed so that the session is not immediately closed. In some cases, we have noticed that the refresh takes longer than the timeout, and sessions are closed prematurely. This is being looked into.

    The next release provides an argument that will cause wait to block indefinitely. You will be able to call fiftyone app launch --remote --wait 0.

    In the meantime, I would recommend writing and calling a small script (launch_app.py) to permanently block until it is exited.

    import fiftyone as fo
    
    session = fo.launch_app(remote=True)
    
    # Indefinite blocking
    while True:
        pass
    
    python launch_app.py