Search code examples
systemdapache-drill

Error when creating a systemd unit for Apache Drill


I have installed Apache Drill and want to start it when the machine boots up. To do this, I have created the following systemd unit in /etc/systemd/system/drill.service

[Unit]
Description=Start/Stop Apache Drill
After=syslog.target network.target

[Service]
Type=forking
User=me
Group=me
ExecStartPre==-/usr/bin/zk_up
ExecStart=/opt/apache/drill/current/bin/drillbit.sh --config /opt/apache/drill/current/conf start
ExecStop=/opt/apache/drill/current/bin/drillbit.sh stop

[Install]
WantedBy=multi-user.target

The issue is that when I issue the command systemctl start drill, the service does not start completely. It seems to hang and then times out. While the process is hung, running the command systemctl status -l drill.service shows the status as activating. This is the output of the command systemctl status -l drill.service

drill.service - Start/Stop Apache Drill
   Loaded: loaded (/etc/systemd/system/drill.service; disabled; vendor preset: disabled)
   Active: activating (start) since Fri 2019-10-25 07:32:24 UTC; 55s ago
  Process: 10257 ExecStartPre=/usr/bin/zk_up (code=exited, status=0/SUCCESS)
  Control: 10262 (drillbit.sh)
   CGroup: /system.slice/drill.service
           ├─10262 /bin/bash /opt/apache/drill/current/bin/drillbit.sh --config /opt/apache/drill/current/conf start
           ├─10273 /bin/bash /opt/apache/drill/current/bin/drillbit.sh --config /opt/apache/drill/current/conf start
           ├─10274 find -L / -name java -type f
           └─10275 head -n 1

After the process fails, I see the following message displayed Job for drill.service failed because a timeout was exceeded. See "systemctl status drill.service" and "journalctl -xe" for details.

The command systemctl status -l drill.service after the timeout returns the following

drill.service - Start/Stop Apache Drill
   Loaded: loaded (/etc/systemd/system/drill.service; disabled; vendor preset: disabled)
   Active: failed (Result: timeout) since Fri 2019-10-25 07:33:54 UTC; 3min 3s ago
  Process: 10262 ExecStart=/opt/apache/drill/current/bin/drillbit.sh --config /opt/apache/drill/current/conf start (code=killed, signal=TERM)
  Process: 10257 ExecStartPre=/usr/bin/zk_up (code=exited, status=0/SUCCESS)

And when I run journalctl -xe, I see the following messages

Oct 25 07:39:17 drill-1 drillbit.sh[10774]: find: File system loop detected; ‘/sys/bus/cpu/devices/cpu0/node0/cpu1/driver/cpu2/firmware_node/subsystem/devices/PNP0303:00/physical_node/subsystem/devices/00:03/tty/ttyS0/subsystem/ttyS2/device/subsystem/devices/VMBUS:01/firmware_node/2dd1ce17-079e-403c-b352-a1921ee207ee/driver/b6650ff7-33bc-4840-8048-e0676786f393/subsystem/devices/00000000-0001-8899-0000-000000000000/host3/scsi_host/host3/subsystem/host2/device/target2:0:0/subsystem/devices/3:0:1:0/scsi_device/3:0:1:0/subsystem/2:0:0:0/device/block/sda/sda2/subsystem/sda’ is part of the same file system loop as ‘/sys/bus/cpu/devices/cpu0/node0/cpu1/driver/cpu2/firmware_node/subsystem/devices/PNP0303:00/physical_node/subsystem/devices/00:03/tty/ttyS0/subsystem/ttyS2/device/subsystem/devices/VMBUS:01/firmware_node/2dd1ce17-079e-403c-b352-a1921ee207ee/driver/b6650ff7-33bc-4840-8048-e0676786f393/subsystem/devices/00000000-0001-8899-0000-000000000000/host3/scsi_host/host3/subsystem/host2/device/target2:0:0/subsystem/devices/3:0:1:0/scsi_device/3:0:1:0/subsystem/2:0:0:0/device/block/sda’.
Oct 25 07:39:17 drill-1 drillbit.sh[10774]: find: File system loop detected; ‘/sys/bus/cpu/devices/cpu0/node0/cpu1/driver/cpu2/firmware_node/subsystem/devices/PNP0303:00/physical_node/subsystem/devices/00:03/tty/ttyS0/subsystem/ttyS2/device/subsystem/devices/VMBUS:01/firmware_node/2dd1ce17-079e-403c-b352-a1921ee207ee/driver/b6650ff7-33bc-4840-8048-e0676786f393/subsystem/devices/00000000-0001-8899-0000-000000000000/host3/scsi_host/host3/subsystem/host2/device/target2:0:0/subsystem/devices/3:0:1:0/scsi_device/3:0:1:0/subsystem/2:0:0:0/device/block/sda/sda2/subsystem/sdb/bdi/subsystem/2:0/subsystem’ is part of the same file system loop as ‘/sys/bus/cpu/devices/cpu0/node0/cpu1/driver/cpu2/firmware_node/subsystem/devices/PNP0303:00/physical_node/subsystem/devices/00:03/tty/ttyS0/subsystem/ttyS2/device/subsystem/devices/VMBUS:01/firmware_node/2dd1ce17-079e-403c-b352-a1921ee207ee/driver/b6650ff7-33bc-4840-8048-e0676786f393/subsystem/devices/00000000-0001-8899-0000-000000000000/host3/scsi_host/host3/subsystem/host2/device/target2:0:0/subsystem/devices/3:0:1:0/scsi_device/3:0:1:0/subsystem/2:0:0:0/device/block/sda/sda2/subsystem/sdb/bdi/subsystem’.
Oct 25 07:39:17 drill-1 drillbit.sh[10774]: find: File system loop detected; ‘/sys/bus/cpu/devices/cpu0/node0/cpu1/driver/cpu2/firmware_node/subsystem/devices/PNP0303:00/physical_node/subsystem/devices/00:03/tty/ttyS0/subsystem/ttyS2/device/subsystem/devices/VMBUS:01/firmware_node/2dd1ce17-079e-403c-b352-a1921ee207ee/driver/b6650ff7-33bc-4840-8048-e0676786f393/subsystem/devices/00000000-0001-8899-0000-000000000000/host3/scsi_host/host3/subsystem/host2/device/target2:0:0/subsystem/devices/3:0:1:0/scsi_device/3:0:1:0/subsystem/2:0:0:0/device/block/sda/sda2/subsystem/sdb/bdi/subsystem/8:0/subsystem’ is part of the same file system loop as ‘/sys/bus/cpu/devices/cpu0/node0/cpu1/driver/cpu2/firmware_node/subsystem/devices/PNP0303:00/physical_node/subsystem/devices/00:03/tty/ttyS0/subsystem/ttyS2/device/subsystem/devices/VMBUS:01/firmware_node/2dd1ce17-079e-403c-b352-a1921ee207ee/driver/b6650ff7-33bc-4840-8048-e0676786f393/subsystem/devices/00000000-0001-8899-0000-000000000000/host3/scsi_host/host3/subsystem/host2/device/target2:0:0/subsystem/devices/3:0:1:0/scsi_device/3:0:1:0/subsystem/2:0:0:0/device/block/sda/sda2/subsystem/sdb/bdi/subsystem’.
Oct 25 07:39:17 drill-1 drillbit.sh[10774]: find: File system loop detected; ‘/sys/bus/cpu/devices/cpu0/node0/cpu1/driver/cpu2/firmware_node/subsystem/devices/PNP0303:00/physical_node/subsystem/devices/00:03/tty/ttyS0/subsystem/ttyS2/device/subsystem/devices/VMBUS:01/firmware_node/2dd1ce17-079e-403c-b352-a1921ee207ee/driver/b6650ff7-33bc-4840-8048-e0676786f393/subsystem/devices/00000000-0001-8899-0000-000000000000/host3/scsi_host/host3/subsystem/host2/device/target2:0:0/subsystem/devices/3:0:1:0/scsi_device/3:0:1:0/subsystem/2:0:0:0/device/block/sda/sda2/subsystem/sdb/bdi/subsystem/0:38/subsystem’ is part of the same file system loop as ‘/sys/bus/cpu/devices/cpu0/node0/cpu1/driver/cpu2/firmware_node/subsystem/devices/PNP0303:00/physical_node/subsystem/devices/00:03/tty/ttyS0/subsystem/ttyS2/device/subsystem/devices/VMBUS:01/firmware_node/2dd1ce17-079e-403c-b352-a1921ee207ee/driver/b6650ff7-33bc-4840-8048-e0676786f393/subsystem/devices/00000000-0001-8899-0000-000000000000/host3/scsi_host/host3/subsystem/host2/device/target2:0:0/subsystem/devices/3:0:1:0/scsi_device/3:0:1:0/subsystem/2:0:0:0/device/block/sda/sda2/subsystem/sdb/bdi/subsystem’.
Oct 25 07:39:17 drill-1 drillbit.sh[10774]: find: File system loop detected; ‘/sys/bus/cpu/devices/cpu0/node0/cpu1/driver/cpu2/firmware_node/subsystem/devices/PNP0303:00/physical_node/subsystem/devices/00:03/tty/ttyS0/subsystem/ttyS2/device/subsystem/devices/VMBUS:01/firmware_node/2dd1ce17-079e-403c-b352-a1921ee207ee/driver/b6650ff7-33bc-4840-8048-e0676786f393/subsystem/devices/00000000-0001-8899-0000-000000000000/host3/scsi_host/host3/subsystem/host2/device/target2:0:0/subsystem/devices/3:0:1:0/scsi_device/3:0:1:0/subsystem/2:0:0:0/device/block/sda/sda2/subsystem/sdb/bdi/subsystem/8:16’ is part of the same file system loop as ‘/sys/bus/cpu/devices/cpu0/node0/cpu1/driver/cpu2/firmware_node/subsystem/devices/PNP0303:00/physical_node/subsystem/devices/00:03/tty/ttyS0/subsystem/ttyS2/device/subsystem/devices/VMBUS:01/firmware_node/2dd1ce17-079e-403c-b352-a1921ee207ee/driver/b6650ff7-33bc-4840-8048-e0676786f393/subsystem/devices/00000000-0001-8899-0000-000000000000/host3/scsi_host/host3/subsystem/host2/device/target2:0:0/subsystem/devices/3:0:1:0/scsi_device/3:0:1:0/subsystem/2:0:0:0/device/block/sda/sda2/subsystem/sdb/bdi’.

Can anyone tell me why am I seeing these messages and what do I need to do to get it working?

If I run the command defined for the ExecStart parameter on my terminal itself, Apache Drill start without throwing any errors. The command /usr/bin/zk_up just checks if the zookeepers are running before I start Drill and as shown above, this command exits successfully.


Solution

  • When I closely looked at the output of the systemctl status -l drill.service command, I saw that the culprit that was printing the file system loop messages was the find command and that command was looking for java. I added the JAVA_HOME to the environment in the systemd file and it worked. My new systemd unit is

    [Unit]
    Description=Start/Stop Apache Drill
    After=syslog.target network.target
    
    [Service]
    Type=forking
    User=me
    Group=me
    Environment="JAVA_HOME=/opt/java/current"
    ExecStartPre==-/usr/bin/zk_up
    ExecStart=/opt/apache/drill/current/bin/drillbit.sh --config /opt/apache/drill/current/conf start
    ExecStop=/opt/apache/drill/current/bin/drillbit.sh stop
    
    [Install]
    WantedBy=multi-user.target
    

    where /opt/java/current is where my $JAVA_HOME is.