Search code examples
spring-dataspring-data-gemfire

@LocatorApplication starts and then immediately stops


Everything seems to be created fine but once it finishes initializing everything it just stops.

@SpringBootApplication
@LocatorApplication
public class ServerApplication {

  public static void main(String[] args) {
    SpringApplication.run(ServerApplication.class, args);
  }
}

Log:

2020-08-03 10:59:18.250  INFO 7712 --- [           main] o.a.g.d.i.InternalLocator                : Locator started on 10.25.209.139[8081]
2020-08-03 10:59:18.250  INFO 7712 --- [           main] o.a.g.d.i.InternalLocator                : Starting server location for Distribution Locator on LB183054.dmn1.fmr.com[8081]
2020-08-03 10:59:18.383  INFO 7712 --- [           main] c.f.g.l.LocatorSpringApplication         : Started LocatorSpringApplication in 8.496 seconds (JVM running for 9.318)
2020-08-03 10:59:18.385  INFO 7712 --- [m shutdown hook] o.a.g.d.i.InternalDistributedSystem      : VM is exiting - shutting down distributed system
2020-08-03 10:59:18.395  INFO 7712 --- [m shutdown hook] o.a.g.i.c.GemFireCacheImpl               : GemFireCache[id = 1329087972; isClosing = true; isShutDownAll = false; created = Mon Aug 03 10:59:15 EDT 2020; server = false; copyOnRead = false; lockLease = 120; lockTimeout = 60]: Now closing.
2020-08-03 10:59:18.416  INFO 7712 --- [m shutdown hook] o.a.g.d.i.ClusterDistributionManager     : Shutting down DistributionManager 10.25.209.139(locator1:7712:locator)<ec><v0>:41000. 
2020-08-03 10:59:18.517  INFO 7712 --- [m shutdown hook] o.a.g.d.i.ClusterDistributionManager     : Now closing distribution for 10.25.209.139(locator1:7712:locator)<ec><v0>:41000
2020-08-03 10:59:18.518  INFO 7712 --- [m shutdown hook] o.a.g.d.i.m.g.Services                   : Stopping membership services
2020-08-03 10:59:18.518  INFO 7712 --- [ip View Creator] o.a.g.d.i.m.g.Services                   : View Creator thread is exiting
2020-08-03 10:59:18.520  INFO 7712 --- [Server thread 1] o.a.g.d.i.m.g.Services                   : GMSHealthMonitor server thread exiting
2020-08-03 10:59:18.536  INFO 7712 --- [m shutdown hook] o.a.g.d.i.ClusterDistributionManager     : DistributionManager stopped in 120ms.
2020-08-03 10:59:18.537  INFO 7712 --- [m shutdown hook] o.a.g.d.i.ClusterDistributionManager     : Marking DistributionManager 10.25.209.139(locator1:7712:locator)<ec><v0>:41000 as closed.

Solution

  • Yes, this is the expected behavior, OOTB.

    Most Apache Geode processes (clients (i.e. ClientCache), Locators, Managers and "peer" Cache nodes/members of a cluster/distributed system) only create daemon Threads (i.e. non-blocking Threads). Therefore, the Apache Geode JVM process will startup, initialize itself and then shutdown immediately.

    Only an Apache Geode CacheServer process (a "peer" Cache that has a CacheServer component to listen for client connections), starts and continues to run. That is because the ServerSocket used to listen for client Socket connections is created on a non-daemon Thread (i.e. blocking Thread), which prevents the JVM process from shutting down. Otherwise, a CacheServer would fall straight through as well.

    You might be thinking, well, how does Gfsh prevent Locators (i.e. using the start locator command) and "servers" (i.e. using the start server command) from shutting down?

    NOTE: By default, Gfsh creates a CacheServer instance when starting a GemFire/Geode server using the start server command. The CacheServer component of the "server" can be disabled by specifying the --disable-default-server option to the start server command. In this case, this "server" will not be able to serve clients. Still the peer node/member will continue to run, but not without extra help. See here for more details on the start server Gfsh command.

    So, how does Gfsh prevent the processes from falling through?

    Under-the-hood, Gfsh uses the LocatorLauncher and ServerLauncher classes to configure and fork the JVM processes to launch Locators and servers, respectively.

    By way of example, here is Gfsh's start locator command using the LocatorLauncher class. Technically, it uses the configuration from the LocatorLauncher class instance to construct (and specifically, here) the java command-line used to fork and launch (and specifically, here) a separate JVM process.

    However, the key here is the specific "command" passed to the LocatorLauncher class when starting the Locator, which is the START command (here).

    In the LocatorLauncher class, we see that the START command does the following, from the main method, to the run method, it starts the Locator, then waitsOnLocator (with implementation).

    Without the wait, the Locator would fall straight through as you are experiencing.

    You can simulate the same effect (i.e. "falling straight through") using the following code, which uses the Apache Geode API to configure and launch a Locator (in-process).

    public class ApacheGeodeLocatorApplication {
    
        public static void main(String[] args) {
    
            LocatorLauncher locatorLauncher = new LocatorLauncher.Builder()
                .set("jmx-manager", "true")
                .set("jmx-manager-port", "0")
                .set("jmx-manager-start", "true")
                .setMemberName("ApacheGeodeBasedLocator")
                .setPort(0)
                .build();
    
            locatorLauncher.start();
    
            //locatorLauncher.waitOnLocator();
        }
    }
    

    This simple little program will fall straight through. However, if you uncomment locatorLaucncher.waitOnLocator(), then the JVM process will block.

    This is not unlike what SDG's LocatorFactoryBean class (see source) is doing actually. It, too, uses the LocatorLauncher class to configure and bootstrap a Locator in-process. The LocatorFactoryBean is the class used to configure and bootstrap a Locator when declaring the SDG @LocatorApplication annotation on your @SpringBootApplication class.

    However, I do think there is room for improvement, here. Therefore, I have filed DATAGEODE-361.

    In the meantime, and as a workaround, you can achieve the same effect of a blocking Locator by having a look at the Smoke Test for the same in Spring Boot for Apache Geode (SBDG) project. See here.

    However, after DATAGEODE-361 is complete, the extra logic preventing the Locator JVM process from shutting down will no longer be necessary.