Search code examples
solarishyperic

Solaris svcs command shows wrong status


I have freshly installed an application on solaris 5.10 . When checked through ps -ef | grep hyperic | grep agent, process are up and running . When checked the status through svcs hyperic-agent command, the output shows that the agent is in maintenance mode . Application is working fine and I dont have any issues with the application . Please help


Solution

  • There are several reasons that lead to that behavior:

    • Starter (start/exec property of service) returned status that is different from SMF_EXIT_OK (zero). Than you may check logs:

       # svcs -x ssh
       ...
       See: /var/svc/log/network-ssh:default.log
      

      If you check logs, you may see following messages that means, starter script failed or incorrectly written:

       [ Aug 11 18:40:30 Method "start" exited with status 96 ]
      
    • Another reason for such behavior is that service faults during while its working (i.e. one of processes coredumps or receives kill signal or all processes exits) as described here: https://blogs.oracle.com/lianep/entry/smf_5_fault_retry_models

      The actual system that provides SMF facilities for monitoring that is System Contracts. You may determine contract ID of online service with svcs -v (field CTID):

      # svcs -vp svc:/network/smtp:sendmail
      STATE          NSTATE        STIME    CTID   FMRI
      online         -             Apr_14       68 svc:/network/smtp:sendmail
                  Apr_14       1679 sendmail
                  Apr_14       1681 sendmail
      

      Than watch events with ctwatch:

      # ctwatch 68
      CTID    EVID    CRIT ACK CTTYPE   SUMMARY
      68      28      crit no  process  contract empty
      

      Than there are two options to handle that:

      • There is a real problem with service so it eventually faults. Than debug the application.

      • It is normal behavior of service, so you should edit and re-import your service manifest, to make SMF less paranoid. I.e. configure ignore_error and duration properties.