ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » Dropping channels

Post new topic  Reply to topic
 Dropping channels « View previous topic :: View next topic » 
Author Message
HenriqueS
PostPosted: Fri Apr 01, 2011 11:47 am    Post subject: Dropping channels Reply with quote

Master

Joined: 22 Sep 2006
Posts: 235

Hello,

I have noticed many channels being dropped without a reason (RECEIVER, SENDER and SVRCONN). Log messages below:

Code:

-------------------------------------------------------------------------------
04/01/2011 03:54:46 PM - Process(10782.4) User(mqm) Program(amqrmppa)
AMQ9604: Channel 'CONN_MES' terminated unexpectedly

EXPLANATION:
The process or thread executing channel 'CONN_MES' is no longer running. The
check process system call returned 545284357 for process 10755.
ACTION:
No immediate action is required because the channel entry has been removed from
the list of running channels. Inform the system administrator who should
examine the operating system procedures to determine why the channel process
has terminated.


Code:

----- amqrcsia.c : 1424 -------------------------------------------------------
04/01/2011 03:54:46 PM - Process(10782.4) User(mqm) Program(amqrmppa)
AMQ9604: Channel 'CONN_MES' terminated unexpectedly

EXPLANATION:
The process or thread executing channel 'CONN_MES' is no longer running. The
check process system call returned 545284357 for process 10755.
ACTION:
No immediate action is required because the channel entry has been removed from
the list of running channels. Inform the system administrator who should
examine the operating system procedures to determine why the channel process
has terminated.


Alognside with the log messages, I have saw some FFSTRs popping out...

Code:

 WebSphere MQ First Failure Symptom Report                                   |
| =========================================                                   |
|                                                                             |
| Date/Time         :- Friday April 01 16:12:53 BRT 2011                      |
| Host Name         :- sbcdf362.bc (Linux 2.6.18-194.26.1.el5)                |
| PIDS              :- 5724H7210                                              |
| LVLS              :- 6.0.2.10                                               |
| Product Long Name :- WebSphere MQ for Linux (x86-64 platform)               |
| Vendor            :- IBM                                                    |
| Probe Id          :- XC130003                                               |
| Application Name  :- MQM                                                    |
| Component         :- xehExceptionHandler                                    |
| SCCS Info         :- lib/cs/unix/amqxerrx.c, 1.214.1.13                     |
| Line Number       :- 1371                                                   |
| Build Date        :- Aug 25 2010                                            |
| CMVC level        :- p600-210-100825                                        |
| Build Type        :- IKAP - (Production)                                    |
| UserID            :- 00021038 (mqm)                                         |
| Program Name      :- amqrmppa                                               |
| Addressing mode   :- 64-bit                                                 |
| Process           :- 11365                                                  |
| Thread-Process    :- 11365                                                  |
| Thread            :- 883                                                    |
| ThreadingModel    :- PosixThreads                                           |
| QueueManager      :- QM!MQ_H_BC                                             |
| ConnId(1) IPCC    :- 66911                                                  |
| ConnId(3) QM-P    :- 68723                                                  |
| Last HQC          :- 1.0.0-1550192                                          |
| Last HSHMEMB      :- 0.0.0-0                                                |
| Major Errorcode   :- STOP                                                   |
| Minor Errorcode   :- OK                                                     |
| Probe Type        :- HALT6109                                               |
| Probe Severity    :- 1                                                      |
| Probe Description :- AMQ6109: An internal WebSphere MQ error has occurred.  |
| FDCSequenceNumber :- 0                                                      |
| Arith1            :- 11 b                                                   |
| Comment1          :- SIGSEGV: address not mapped(0x10000000f)               |
|                                                                             |


Any ideas? My semaphore, shared memory and file-max settings for the Linux box are OK (used mqconfig util here and tweak semmi a few days ago - but problems persist.
Back to top
View user's profile Send private message
bruce2359
PostPosted: Fri Apr 01, 2011 12:12 pm    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9471
Location: US: west coast, almost. Otherwise, enroute.

Search google for probe id XC130003
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
mvic
PostPosted: Fri Apr 01, 2011 2:10 pm    Post subject: Re: Dropping channels Reply with quote

Jedi

Joined: 09 Mar 2004
Posts: 2080

You should open a PMR for this fairly quickly since you are at the latest published v6 code level.

XC130003 means a memory exception of some kind. IBM will be reading the set of details that appear beneath the header portion you quoted.
Back to top
View user's profile Send private message
HenriqueS
PostPosted: Mon Apr 04, 2011 9:34 am    Post subject: Some news... Reply with quote

Master

Joined: 22 Sep 2006
Posts: 235

Did some research and found the following:

http://www-01.ibm.com/support/docview.wss?uid=swg21263631

So most errors from this probe ID are channel exit errors.

We have an channel exit here.

So we:
1) Set the
Code:
 
export MQS_ACTION_ON_EXCEPTION=HANG_ALL

environment var.

2) Get the PID from the first FDC file that shows up soon after this setting:
Code:

cat /var/mqm/errors/*.FDC


3) Use gdb with the process ID:
Code:

sudo gdb process (PID)


4) Ask the backtrace for all threads under this PID
Code:

thread apply all bt full


Here is my backtrace:
Code:

Thread 4 (Thread 0x41d93940 (LWP 23916)):
#0  0x00000034e8c9a541 in nanosleep () from /lib64/libc.so.6
No symbol table info available.
#1  0x00000034e8c9a364 in sleep () from /lib64/libc.so.6
No symbol table info available.
#2  0x00002b0bb37b5987 in xcsSleep () from /opt/mqm/lib64/libmqmcs_r.so
No symbol table info available.
#3  0x00002b0bb372b7de in xehHangIfRequired ()
   from /opt/mqm/lib64/libmqmcs_r.so
No symbol table info available.
#4  0x00002b0bb372b840 in xehInterpretSavedSigaction ()
   from /opt/mqm/lib64/libmqmcs_r.so
No symbol table info available.
#5  0x00002b0bb372bf36 in xehExceptionHandler ()
   from /opt/mqm/lib64/libmqmcs_r.so
No symbol table info available.
#6  <signal handler called>
No symbol table info available.
#7  0x00000034e8c705a4 in malloc_consolidate () from /lib64/libc.so.6
No symbol table info available.
#8  0x00000034e8c72bbc in _int_malloc () from /lib64/libc.so.6
No symbol table info available.
#9  0x00000034e8c74e2e in malloc () from /lib64/libc.so.6
No symbol table info available.
#10 0x00000034e8c6185a in __fopen_internal () from /lib64/libc.so.6
No symbol table info available.
#11 0x00002aaaaffdd1ae in wlog (
    logmessage=0x2aaaaffdd390 "DLL loaded and MsgExit() function called. Channel:") at MQExit.c:599
        size = 1368046
        datetime = 0x1b9fc410 "201104041333"
        brokentime = 0x34e8f56cc0
        rawtime = 1301934822
        logrow = "\nMQExit.dll - 201104041333 - DLL loaded and MsgExit() function called. Channel:", '\000' <repeats 41 times>
        logfilehandle = 0x1b9fc3f0
#12 0x00002aaaaffdc28e in MsgExit (pExitParms=0x41d91550,
    pChannelDef=0x1b9fcb88, pDataLength=0x41d91540,
    pAgentBufferLength=0x41d91544, pAgentBuffer=0x1b9f0f98 "XQH \001",
    pExitBufferLength=0x1b953ce8, pExitBuffer=0x1b953cb0) at MQExit.c:104
        nomecanal = 0x1b9fc3f0 "C90400888.00038166.2"
#13 0x00002b0bb31c44d3 in rriCallMsgExit () from /opt/mqm/lib64/libmqmr_r.so
No symbol table info available.
#14 0x00002b0bb321d40b in rriReceiveData () from /opt/mqm/lib64/libmqmr_r.so
No symbol table info available.
#15 0x00002b0bb3210003 in rrxResponder () from /opt/mqm/lib64/libmqmr_r.so
No symbol table info available.
#16 0x00002b0bb31176c6 in ccxResponder () from /opt/mqm/lib64/libmqmr_r.so
No symbol table info available.
#17 0x00002b0bb311783c in cciResponderThread ()
   from /opt/mqm/lib64/libmqmr_r.so
No symbol table info available.
#18 0x00002b0bb3773777 in ThreadMain () from /opt/mqm/lib64/libmqmcs_r.so
No symbol table info available.
#19 0x00000034e980673d in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#20 0x00000034e8cd40cd in clone () from /lib64/libc.so.6
No symbol table info available.


I need some help interpreting the trace...I have some guesses, but any help is welcome.

IT SEEMS the exception is raised after I call fopen for the log file my exit uses. There is a malloc function (called probably inside of fopen?). But logfilehandle returns a non NULL value, so how can I check if everything is doing OK at runtime? I will check logrow also, to see if it is not overflowing...
Back to top
View user's profile Send private message
mvic
PostPosted: Mon Apr 04, 2011 1:55 pm    Post subject: Re: Some news... Reply with quote

Jedi

Joined: 09 Mar 2004
Posts: 2080

Exceptions in malloc usually mean you, or some code in your process, have corrupted the C runtime's heap.
Back to top
View user's profile Send private message
HenriqueS
PostPosted: Mon Apr 04, 2011 2:21 pm    Post subject: Reply with quote

Master

Joined: 22 Sep 2006
Posts: 235

I have disabled the code portion that writes to the log file.

Now, I am getting problems whenever the exit is loaded. Did a reboot, but it did not help either.

This is turning into a nightmare. Somedays ago I did a code cleanup. It is a pretty lean code right now...

Code:

+-----------------------------------------------------------------------------+
|                                                                             |
| WebSphere MQ First Failure Symptom Report                                   |
| =========================================                                   |
|                                                                             |
| Date/Time         :- Monday April 04 19:10:07 BRT 2011                      |
| Host Name         :- sbcdf365.bc (Linux 2.6.18-194.3.1.el5)                 |
| PIDS              :- 5724H7210                                              |
| LVLS              :- 6.0.2.10                                               |
| Product Long Name :- WebSphere MQ for Linux (x86-64 platform)               |
| Vendor            :- IBM                                                    |
| Probe Id          :- XC130003                                               |
| Application Name  :- MQM                                                    |
| Component         :- xehExceptionHandler                                    |
| SCCS Info         :- lib/cs/unix/amqxerrx.c, 1.214.1.13                     |
| Line Number       :- 1371                                                   |
| Build Date        :- Aug 25 2010                                            |
| CMVC level        :- p600-210-100825                                        |
| Build Type        :- IKAP - (Production)                                    |
| UserID            :- 00000200 (mqm)                                         |
| Program Name      :- runmqchl                                               |
| Addressing mode   :- 64-bit                                                 |
| Process           :- 3161                                                   |
| Thread-Process    :- 3161                                                   |
| Thread            :- 1                                                      |
| ThreadingModel    :- PosixThreads                                           |
| QueueManager      :- QM!MQ_T_BC                                             |
| ConnId(1) IPCC    :- 349                                                    |
| Last HQC          :- 1.0.0-1522304                                          |
| Last HSHMEMB      :- 0.0.0-0                                                |
| Major Errorcode   :- STOP                                                   |
| Minor Errorcode   :- OK                                                     |
| Probe Type        :- HALT6109                                               |
| Probe Severity    :- 1                                                      |
| Probe Description :- AMQ6109: An internal WebSphere MQ error has occurred.  |
| FDCSequenceNumber :- 0                                                      |
| Arith1            :- 11 b                                                   |
| Comment1          :- SIGSEGV: invalid address permissions(0x2aaaaf035086)   |
|                                                                             |
+-----------------------------------------------------------------------------+
MQM Function Stack
rriCaller
rriCallerEntry
rriInitSess
rriInitExits
rriInitExit
rriCALL_EXIT
xcsFFST
Back to top
View user's profile Send private message
mvic
PostPosted: Wed Apr 06, 2011 1:57 am    Post subject: Reply with quote

Jedi

Joined: 09 Mar 2004
Posts: 2080

HenriqueS wrote:
This is turning into a nightmare. Somedays ago I did a code cleanup. It is a pretty lean code right now...

I suggest you back out your changes, or remove the exit altogether, and see if the problem is still there.

If this is the same problem as before, it looks like heap corruption. Check your code for possible overruns or underruns of your allocated buffers.
Back to top
View user's profile Send private message
HenriqueS
PostPosted: Wed Apr 06, 2011 8:56 am    Post subject: Reply with quote

Master

Joined: 22 Sep 2006
Posts: 235

Thanks for the feedback! Some info here for future reference:

1) The problem seems solved now, using gdb and attaching to the running process helped a lot. Did pass thru thousands of messages and the exit is running fine, writing the log file and putting data to some control queues. Refer to the IBM article: http://www-01.ibm.com/support/docview.wss?uid=swg21263631 .

2) I did compile the exit with full debug info (gcc -O0 -g3), it gives extra useful information under gdb.

3) I almost went to use Valgrind, but it does not attach to running processes, good only for starting processes from ground zero since it needs to setup a lot of boiler plate. Still, I hve read it is THE tool for checking C program memory allocation and bounds checking.

4) Use of memset to set up some arrays is not always indicated and has some strange behavior as documented on the net, I did let the compiler to do the job (i.e.: char myarray[256] = { 0 }; )

5) MQCONNX is not adviced to be used under channel exits, mostly because it allows to customize some connection options which willl mess up the exit behaviour.

6) GCC warnings fooled me. I had some functions which needed pointers as parameters, and some calls which did not supply anything. GCC did not complain about it during compile and I was getting errors only on runtime.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » General IBM MQ Support » Dropping channels
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.