MQSeries.net :: View topic - Cluster workload exit intermittant SIGSEGV

lorenmcc · Posted: Tue Dec 11, 2012 9:35 am Post subject:

I am receiving a SIGSEGV (address not mapped) intermittantly in an exit I wrote for Cluster WLM exit. The purpose of the exit is to logically split our cluster so applications can test an implementation before making it live (one side of the cluster is BAU, the other side is for validation testing). The exit is live all the time (and normally does nothing except exit), but is only activated/deactivated when it receives a signal to do that (I won't go into details as to how that happens or how the split is determined as that is not where the problem is).

Now for the problem. I want to be able to log the "enable" and "disable" activities of the exit. This works most of the time, however, sometimes the exit terminates (and of course restarts the cluster process) due to a detected SIGSEGV in the fprintf function. I suppose this is due to multi-threading, but I am at a loss as to how to prevent this from happening.

The Stack trace and the code causing the SIGSEGV are attached below. This is for 64 bit Linux on Redhat, but the problem also occurs on 32bit Linux (RedHat), zLinux (RedHat) and Solaris. This does work a good portion of the time, but intermittently crashes.

any help appreciated.

Thanks

O/S Call Stack for current thread

/opt/mqm/lib64/libmqmcs_r.so(xcsPrintStackForCurrentThread+0xa0)[0x2b670b420070]
/opt/mqm/lib64/libmqmcs_r.so(signalHandlerInternal+0x5c)[0x2b670b4365ac]
/opt/mqm/lib64/libmqmcs_r.so(PrepareDumpAreas+0xd2)[0x2b670b434ac2]
/opt/mqm/lib64/libmqmcs_r.so(xcsFFSTFn+0x20d9)[0x2b670b438f89]
/opt/mqm/lib64/libmqmcs_r.so(xehExceptionHandler+0x625)[0x2b670b4332e5]
/lib64/libpthread.so.0[0x39efa0ebe0]
/lib64/libc.so.6(_IO_vfprintf+0x39)[0x39ef242b59]
/lib64/libc.so.6(_IO_fprintf+0x88)[0x39ef24cd28]
/var/mqm/exits64/hmclwley(clwlFunction+0x62b)[0x2aaaaeb54339]
/opt/mqm/lib64/libmqmr_r.so(rfxCallClusterWorkloadExit+0xf6)[0x2b670b00e4e6]
/opt/mqm/bin/amqzlwa0(xcsTerminate+0x903)[0x401e5b]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x39ef21d994]
/opt/mqm/bin/amqzlwa0(xcsTerminate+0x42)[0x40159a]

Failing code:

curtime = time(NULL);
/*char* dt = ctime(&curtime);*/
curtm = localtime (&curtime);
strftime(dt, 25, "%a %b %e %Y %H:%M:%S", curtm);
fprintf(debugf, "%s:\tEnableExit\n", dt);
fflush(debugf);

+-----------------------------------------------------------------------------+
| |
| WebSphere MQ First Failure Symptom Report |
| ========================================= |
| |
| Date/Time :- Tue December 11 2012 14:29:19 UTC |
| UTC Time :- 1355236159.970230 |
| UTC Time Offset :- -300 (EST) |
| Host Name :- chln884 |
| Operating System :- Linux 2.6.18-308.13.1.el5 |
| PIDS :- 5724H7230 |
| LVLS :- 7.0.1.6 |
| Product Long Name :- WebSphere MQ for Linux (x86-64 platform) |
| Vendor :- IBM |
| Probe Id :- XC130003 |
| Application Name :- MQM |
| Component :- xehExceptionHandler |
| SCCS Info :- lib/cs/unix/amqxerrx.c, 1.242.1.2 |
| Line Number :- 1386 |
| Build Date :- Jul 25 2011 |
| CMVC level :- p701-106-110725 |
| Build Type :- IKAP - (Production) |
| Effective UserID :- 1701 (mqm) |
| Real UserID :- 1701 (mqm) |
| Program Name :- amqzlwa0 |
| Addressing mode :- 64-bit |
| Process :- 12135 |
| Process(Thread) :- 12135 |
| Thread :- 1 |
| ThreadingModel :- PosixThreads |
| QueueManager :- mqsb_qma |
| UserApp :- FALSE |
| ConnId(1) IPCC :- 191 |
| Last HQC :- 1.0.0-58944 |
| Last HSHMEMB :- 0.0.0-0 |
| Major Errorcode :- STOP |
| Minor Errorcode :- OK |
| Probe Type :- HALT6109 |
| Probe Severity :- 1 |
| Probe Description :- AMQ6109: An internal WebSphere MQ error has occurred. |
| FDCSequenceNumber :- 0 |
| Arith1 :- 11 (0xb) |
| Comment1 :- SIGSEGV: address not mapped(0xc0) |
| |
+-----------------------------------------------------------------------------+