Search code examples
linuxapachemonoredhatmod-mono

Setup and Debugging of applications run under mod-mono-server4


I have a c# application (servicebus) which runs on a private web server. Its basic job is to accept some web requests and create other processes to handle processing the data packages described in the requests. The processing is often ongoing and can take weeks.

The servicebus will, occasionally, start consuming great amounts of CPU. That is, it is normally idle, getting 1 or 2 seconds of CPU time per day. When it gets into this strange mode, its consuming 100+% CPU all the time. At this point, a new instance of the servicebus gets spawned by apache if a new request comes in. At this point I will have two copies of the servicebus running (and possibly both handling processing requests -- i don't know).

This is the normal process (via ps -aef ) :

UID        PID  PPID  C STIME TTY          TIME CMD
apache    8978     1  0 11:51 ?        00:00:01 /opt/mono/bin/mono /opt/mono/lib/mono/4.0/mod-mono-server4.exe --filename /tmp/mod_mono_server_default --applications /:/opt/ov/vespa/servicebus --nonstop

As you can see, the application is a C# program (compiled with VS 2010 for .NET 4) running via mod-mono-server4 under mono. This is a redhat linux enterprise 6.5 system.

After running for a while that process 'went crazy' and started consuming lots of CPU and mod-mono-server created a new instance. As you can see, I didn't find it until Monday morning after it had used over 2 days of CPU time. Here is the new ps -aef output :

UID        PID  PPID  C STIME TTY          TIME CMD
apache    8978     1 83 Sep19 ?        2-08:26:25 /opt/mono/bin/mono /opt/mono/lib/mono/4.0/mod-mono-server4.exe --filename /tmp/mod_mono_server_default --applications /:/opt/ov/vespa/servicebus --nonstop
apache   32538     1  0 Sep21 ?        00:00:00 /opt/mono/bin/mono /opt/mono/lib/mono/4.0/mod-mono-server4.exe --filename /tmp/mod_mono_server_default --applications /:/opt/ov/vespa/servicebus --nonstop

In case you need to see how the application is configured, I have the snippet from the conf.d file for the application :

# The user and group need to be set before mod_mono.conf is loaded.
User apache
Group apache
# Service Bus setup
Include /etc/httpd/conf/mod_mono.conf
Listen 8081
<VirtualHost *:8081>
    DocumentRoot /opt/ov/vespa/servicebus
    MonoServerPath default /opt/mono/bin/mod-mono-server4
    MonoApplications "/:/opt/ov/vespa/servicebus"
    <Location "/">
        SetHandler mono
        Allow from all
    </Location>
</VirtualHost>

The basic question is... how do I go about debugging this and finding what is wrong with my application? That, however is a bit vague. Normally, I would want to put mono into debug mod and then when it gets into this strange mode I would use kill -ABRT to get a core dump out of it. I assume I could then find a for loop/while loop/etc which is stuck and fix my bug. So, the real question is how do do that? Is that process PID=8978 actually my application being interpreted by mono or is it mono running mod-mono-server4.exe? Or is it mono interpreting mod-mono-server4.exe which in turn is interpreting servicebus? Where in the apache configuration files do I put in the arguments to mono so I can get the --debug I desire.

Normally to debug I would need a process like :

/opt/mono/bin/mono --debug /opt/test/testapp.exe

So, I need to get a --debug into the command line and sort out which PID to actually kill. Then I can use techniques from http://www.mono-project.com/docs/debug+profile/debug/ to debug the core file.

NOTE: I have tried putting in MonoMaxCPUTime and MonoAutoRestartTime directives into the apache conf files to cure this. The problem is, when everything is nominal, they work fine. Once it gets into this bad state(consuming a ton of CPU), the restart fails. Or rather it succeeds in creating a new process but fails to delete the old one (basically the state I am already in).

Debugging so far : I see my log files for PID=8979 stops on 9/21 at 03:27. Given that it often generates a 200% or 300% CPU or more that could easily be the time of the 'crash'. Looking in the apache logs I found an unusual event at that time. A dump of the log is below :

...
[Sun Sep 21 03:28:01 2014] [notice] SIGHUP received.  Attempting to restart
mod-mono-server received a shutdown message
httpd: Could not reliably determine the server's fully qualified domain name, using localhost.localdomain for ServerName
Stacktrace:


Native stacktrace:

    /opt/mono/bin/mono() [0x48cc26]
    /lib64/libpthread.so.0() [0x32fca0f710]
    /lib64/libpthread.so.0(pthread_cond_wait+0xcc) [0x32fca0b5bc]
    /opt/mono/bin/mono() [0x5a6a9c]
    /opt/mono/bin/mono() [0x5ad4e9]
    /opt/mono/bin/mono() [0x5116d8]
    /opt/mono/bin/mono(mono_thread_manage+0x1ad) [0x5161cd]
    /opt/mono/bin/mono(mono_main+0x1401) [0x46a671]
    /lib64/libc.so.6(__libc_start_main+0xfd) [0x32fc21ed1d]
    /opt/mono/bin/mono() [0x4123a9]

Debug info from gdb:

warning: File "/opt/mono/bin/mono-gdb.py" auto-loading has been declined by your `auto-load safe-path' set to "/usr/share/gdb/auto-load:/usr/lib/debug:/usr/bin/mono-gdb.py".
To enable execution of this file add
    add-auto-load-safe-path /opt/mono/bin/mono-gdb.py
line to your configuration file "$HOME/.gdbinit".
To completely disable this security protection add
    set auto-load safe-path /
line to your configuration file "$HOME/.gdbinit".
For more information about this security protection see the
"Auto-loading safe path" section in the GDB manual.  E.g., run from the shell:
    info "(gdb)Auto-loading safe path"
[New LWP 9148]
[New LWP 9135]
[New LWP 9000]
[New LWP 8991]
[New LWP 8990]
[New LWP 8988]
[New LWP 8987]
[New LWP 8986]
[New LWP 8985]
[New LWP 8984]
[Thread debugging using libthread_db enabled]
0x00000032fca0e75d in read () from /lib64/libpthread.so.0
  11 Thread 0x7f0d8bcaf700 (LWP 8984)  0x00000032fca0b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  10 Thread 0x7f0d8b2ae700 (LWP 8985)  0x00000032fca0b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  9 Thread 0x7f0d8a8ad700 (LWP 8986)  0x00000032fca0b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  8 Thread 0x7f0d89eac700 (LWP 8987)  0x00000032fca0b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  7 Thread 0x7f0d894ab700 (LWP 8988)  0x00000032fca0b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  6 Thread 0x7f0d88aaa700 (LWP 8990)  0x00000032fca0b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  5 Thread 0x7f0d880a9700 (LWP 8991)  0x00000032fca0b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
  4 Thread 0x7f0d8713c700 (LWP 9000)  0x00000032fca0d930 in sem_wait () from /lib64/libpthread.so.0
  3 Thread 0x7f0d86157700 (LWP 9135)  0x00000032fc27a983 in malloc () from /lib64/libc.so.6
  2 Thread 0x7f0d8568b700 (LWP 9148)  0x00000032fc2792f0 in _int_malloc () from /lib64/libc.so.6
* 1 Thread 0x7f0d8bcb0740 (LWP 8978)  0x00000032fca0e75d in read () from /lib64/libpthread.so.0

Thread 11 (Thread 0x7f0d8bcaf700 (LWP 8984)):
#0  0x00000032fca0b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000005d59f7 in GC_wait_marker ()
#2  0x00000000005dbabd in GC_help_marker ()
#3  0x00000000005d4778 in GC_mark_thread ()
#4  0x00000032fca079d1 in start_thread () from /lib64/libpthread.so.0
#5  0x00000032fc2e8b5d in clone () from /lib64/libc.so.6

Thread 10 (Thread 0x7f0d8b2ae700 (LWP 8985)):
#0  0x00000032fca0b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000005d59f7 in GC_wait_marker ()
#2  0x00000000005dbabd in GC_help_marker ()
#3  0x00000000005d4778 in GC_mark_thread ()
#4  0x00000032fca079d1 in start_thread () from /lib64/libpthread.so.0
#5  0x00000032fc2e8b5d in clone () from /lib64/libc.so.6

Thread 9 (Thread 0x7f0d8a8ad700 (LWP 8986)):
#0  0x00000032fca0b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000005d59f7 in GC_wait_marker ()
#2  0x00000000005dbabd in GC_help_marker ()
#3  0x00000000005d4778 in GC_mark_thread ()
#4  0x00000032fca079d1 in start_thread () from /lib64/libpthread.so.0
#5  0x00000032fc2e8b5d in clone () from /lib64/libc.so.6

Thread 8 (Thread 0x7f0d89eac700 (LWP 8987)):
#0  0x00000032fca0b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000005d59f7 in GC_wait_marker ()
#2  0x00000000005dbabd in GC_help_marker ()
#3  0x00000000005d4778 in GC_mark_thread ()
#4  0x00000032fca079d1 in start_thread () from /lib64/libpthread.so.0
#5  0x00000032fc2e8b5d in clone () from /lib64/libc.so.6

Thread 7 (Thread 0x7f0d894ab700 (LWP 8988)):
#0  0x00000032fca0b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000005d59f7 in GC_wait_marker ()
#2  0x00000000005dbabd in GC_help_marker ()
#3  0x00000000005d4778 in GC_mark_thread ()
#4  0x00000032fca079d1 in start_thread () from /lib64/libpthread.so.0
#5  0x00000032fc2e8b5d in clone () from /lib64/libc.so.6

Thread 6 (Thread 0x7f0d88aaa700 (LWP 8990)):
#0  0x00000032fca0b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000005d59f7 in GC_wait_marker ()
#2  0x00000000005dbabd in GC_help_marker ()
#3  0x00000000005d4778 in GC_mark_thread ()
#4  0x00000032fca079d1 in start_thread () from /lib64/libpthread.so.0
#5  0x00000032fc2e8b5d in clone () from /lib64/libc.so.6

Thread 5 (Thread 0x7f0d880a9700 (LWP 8991)):
#0  0x00000032fca0b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000005d59f7 in GC_wait_marker ()
#2  0x00000000005dbabd in GC_help_marker ()
#3  0x00000000005d4778 in GC_mark_thread ()
#4  0x00000032fca079d1 in start_thread () from /lib64/libpthread.so.0
#5  0x00000032fc2e8b5d in clone () from /lib64/libc.so.6

Thread 4 (Thread 0x7f0d8713c700 (LWP 9000)):
#0  0x00000032fca0d930 in sem_wait () from /lib64/libpthread.so.0
#1  0x00000000005bea28 in mono_sem_wait ()
#2  0x000000000053b2bb in finalizer_thread ()
#3  0x000000000051375b in start_wrapper ()
#4  0x00000000005a8214 in thread_start_routine ()
#5  0x00000000005d565a in GC_start_routine ()
#6  0x00000032fca079d1 in start_thread () from /lib64/libpthread.so.0
#7  0x00000032fc2e8b5d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f0d86157700 (LWP 9135)):
#0  0x00000032fc27a983 in malloc () from /lib64/libc.so.6
#1  0x00000000005cd0e6 in monoeg_malloc ()
#2  0x00000000005cbef1 in monoeg_g_hash_table_insert_replace ()
#3  0x00000000005acff5 in WaitForMultipleObjectsEx ()
#4  0x0000000000512694 in ves_icall_System_Threading_WaitHandle_WaitAny_internal ()
#5  0x00000000417b0270 in ?? ()
#6  0x00007f0d68000c21 in ?? ()
#7  0x00007f0d847c4b40 in ?? ()
#8  0x00007f0d68003e00 in ?? ()
#9  0x000000004023e890 in ?? ()
#10 0x00007f0d68003e00 in ?? ()
#11 0x00007f0d86156940 in ?? ()
#12 0x00007f0d861568a0 in ?? ()
#13 0x00007f0d8767d000 in ?? ()
#14 0xffffffffffffffff in ?? ()
#15 0x00007f0d86156cc0 in ?? ()
#16 0x00007f0d847c4b40 in ?? ()
#17 0x000000004023e268 in ?? ()
#18 0x0000000000000000 in ?? ()

Thread 2 (Thread 0x7f0d8568b700 (LWP 9148)):
#0  0x00000032fc2792f0 in _int_malloc () from /lib64/libc.so.6
#1  0x00000032fc27a636 in calloc () from /lib64/libc.so.6
#2  0x00000000005cd148 in monoeg_malloc0 ()
#3  0x00000000005cbb94 in monoeg_g_hash_table_new ()
#4  0x00000000005acf94 in WaitForMultipleObjectsEx ()
#5  0x0000000000512694 in ves_icall_System_Threading_WaitHandle_WaitAny_internal ()
#6  0x00000000417b0270 in ?? ()
#7  0x00007f0d60000c21 in ?? ()
#8  0x00007f0d8767d000 in ?? ()
#9  0xffffffffffffffff in ?? ()
#10 0x000000004023e890 in ?? ()
#11 0x00007f0d68003e00 in ?? ()
#12 0x00007f0d8568a940 in ?? ()
#13 0x00007f0d8568a8a0 in ?? ()
#14 0x00007f0d8767d000 in ?? ()
#15 0xffffffffffffffff in ?? ()
#16 0x00007f0d8568acc0 in ?? ()
#17 0x00007f0d864e2990 in ?? ()
#18 0x000000004023e268 in ?? ()
#19 0x0000000000000000 in ?? ()

Thread 1 (Thread 0x7f0d8bcb0740 (LWP 8978)):
#0  0x00000032fca0e75d in read () from /lib64/libpthread.so.0
#1  0x000000000048cdb6 in mono_handle_native_sigsegv ()
#2  <signal handler called>
#3  0x00000032fca0b5bc in pthread_cond_wait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#4  0x00000000005a6a9c in _wapi_handle_timedwait_signal_handle ()
#5  0x00000000005ad4e9 in WaitForMultipleObjectsEx ()
#6  0x00000000005116d8 in wait_for_tids ()
#7  0x00000000005161cd in mono_thread_manage ()
#8  0x000000000046a671 in mono_main ()
#9  0x00000032fc21ed1d in __libc_start_main () from /lib64/libc.so.6
#10 0x00000000004123a9 in _start ()

=================================================================
Got a SIGABRT while executing native code. This usually indicates
a fatal error in the mono runtime or one of the native libraries 
used by your application.
=================================================================

Which I think means the process had a seg fault and was trying to dump core or something and stuck trying to do that? Or did it get a sig ABRT while processing a sig SEGV? In either case, that's a dump of mono, right? I did a find of the full file system and no core was generated so I'm not sure how apache/gdb managed this.

In case it matters I have RedHat 6.5, mono 2.10.8, gcc 4.4.7, mod-mono-server4.exe 2.10.0.0

Basically this boils down to these questions.

  1. How do I get --debug into the mono commands that apache issues?
  2. How do I get apache to save the core files it encounters instead of automatically running gdb on them (as I need to issue more complex commands to get at the underlying c# code)?
  3. What does the command line for my servicebus mean? That is why/how come the mod-mono-server4 isn't a completely separate process from my servicebus? How does the MMS fit into the mono interpreting servicebus processing chain

Or am I totally wrong and will the answers to those questions not help me?


Solution

  • First of all: Mono 2.10 is very old, you may be running into a bug that is already fixed in the latest 3.8.

    As for getting --debug into your app, you can set the environment variable MONO_OPTIONS=--debug, that has the same effect as specifying it on the command line.