ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » Cluster Channels Hang

Post new topic  Reply to topic
 Cluster Channels Hang « View previous topic :: View next topic » 
Author Message
PeterPotkay
PostPosted: Wed Mar 23, 2005 1:51 pm    Post subject: Cluster Channels Hang Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

MQ 5.3 CSD8
Windows 2000
Microsoft Hardware clustering.

QMA, QM1 and QM2 are in a MQ cluster. There are no other QMs in this cluster.
QM1 and QM2 are the Full Repositories.

Every few days, a CLUSSNDR hangs, usually in STARTING status. Messages destined for that channel queue up in the S.C.T.Q., but other cluster messages to the other QM flow fine. it seems random as to which CLUSSNDR hangs, but it is always to or from QMA.

If we try and stop the channel, hoping to manually restart it, it stays stuck in STOPPING. If we try to take the QM offline in MSCS, it then stays stuck in Offline Pending (eventually going to a FAILED status). The only solution is to reboot, which fixes the problem.

It does not happen in our LAB, DEV, or Production. Only QA.

Got a ticket with IBM, but no joy yet.

The following FDC is thrown when the channel gets stuck.
Code:

+-----------------------------------------------------------------------------+
|                                                                             |
| WebSphere MQ First Failure Symptom Report                                   |
| =========================================                                   |
|                                                                             |
| Date/Time         :- Wed March 23 10:29:56 Eastern Standard Time 2005       |
| Host Name         :- ERDSIMMQS001 (Windows 2000 Build 2195: Service Pack 4) |
| PIDS              :- 5724B4100                                              |
| LVLS              :- 530.8  CSD08                                           |
| Product Long Name :- WebSphere MQ for Windows                               |
| Vendor            :- IBM                                                    |
| Probe Id          :- XC130031                                               |
| Application Name  :- MQM                                                    |
| Component         :- xehExceptionHandler                                    |
| Build Date        :- Sep 22 2004                                            |
| CMVC level        :- p530-08-L040921                                        |
| Build Type        :- IKAP - (Production)                                    |
| UserID            :- MUSR_MQADMIN                                           |
| Process Name      :- E:\programs\MQSeries\amqrmppa\amqrmppa.exe             |
| Process           :- 00003860                                               |
| Thread            :- 00001154                                               |
| QueueManager      :- HIGHUBQB                                               |
| Major Errorcode   :- xecF_E_UNEXPECTED_SYSTEM_RC                            |
| Minor Errorcode   :- OK                                                     |
| Probe Type        :- MSGAMQ6119                                             |
| Probe Severity    :- 2                                                      |
| Probe Description :- AMQ6119: An internal WebSphere MQ error has occurred   |
|   (Access Violation at address 01C1C000 when reading)                       |
| FDCSequenceNumber :- 0                                                      |
| Comment1          :- Access Violation at address 01C1C000 when reading      |
|                                                                             |
|                                                                             |
+-----------------------------------------------------------------------------+


The following is thrown to system MQ error log:
Code:

-------------------------------------------------------------------------------
03/23/2005  10:29:55
AMQ6119: An internal WebSphere MQ error has occurred (Access Violation at
address 01C1C000 when reading)

EXPLANATION:
MQ detected an unexpected error when calling the operating system. The MQ error
recording routine has been called.
ACTION:
Use the standard facilities supplied with your system to record the problem
identifier, and to save the generated output files. Contact your IBM support
center.  Do not discard these files until the problem has been resolved.
----- amqxfdcp.c : 631 --------------------------------------------------------
03/23/2005  10:29:56
AMQ6184: An internal WebSphere MQ error has occurred on queue manager HIGHUBQB.

EXPLANATION:
An error has been detected, and the WebSphere MQ error recording routine has
been called. The failing process is process 3860.
ACTION:
Use the standard facilities supplied with your system to record the problem
identifier, and to save the generated output files. Contact your IBM support
center.  Do not discard these files until the problem has been resolved.
----- amqxfdcp.c : 665 --------------------------------------------------------

_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
csmith28
PostPosted: Wed Mar 23, 2005 5:48 pm    Post subject: Reply with quote

Grand Master

Joined: 15 Jul 2003
Posts: 1196
Location: Arizona

Ok I'll take a shot at this at the risk of being proven wrong.

I assume you have already searched the IBM Support, Google and this site for similar problems to no avail.

This looks like it is most likely related to some obscure Security issue related to the (Bill Gates/Spawn of Satan) MS Operating System and the MUSR_MQADMIN user ID.

Perhaps there is some secondary Domain Server that is suddenly deciding that MUSR_MQADMIN is a threat or shouldn't have permission to start a Channel on the QMA Server.

Or maybe Adress 01C1C000 on the local disk is a corrupted sector. Maybe.

Have you ran defrag or chkdsk /R to see if that helps?

If all else fails, call IBM.
_________________
Yes, I am an agent of Satan but my duties are largely ceremonial.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Thu Mar 24, 2005 2:42 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

Possible memory leak /problem on the server.
Does it still do this after a reboot ?

Have you installed additional (inconpatible) software recently ?
Changed tuning on firewall or privacy or virus protectors ?

Enjoy
Back to top
View user's profile Send private message Send e-mail
kman
PostPosted: Thu Mar 24, 2005 7:36 pm    Post subject: Reply with quote

Partisan

Joined: 21 Jan 2003
Posts: 309
Location: Kuala Lumpur, Malaysia

Since you already raised a PMR with IBM on this, I hope they will get back with the resolution soon and you can share it with us.
But barring any network issue, I put my bet on the memory leak issue.
One suggestion is to revert to CSD07.
If it is a memory issue, you can see from the system if it really eats up, or going up.

What's your problem ticket number?
Back to top
View user's profile Send private message Yahoo Messenger
PeterPotkay
PostPosted: Fri Mar 25, 2005 5:09 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

PMR 09786 L6Q

We have CSD8 on because it specifically fixed a problem, so we can't roll back.

I'll look at memory next time it happens.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
vennela
PostPosted: Fri Mar 25, 2005 11:15 am    Post subject: Reply with quote

Jedi Knight

Joined: 11 Aug 2002
Posts: 4055
Location: Hyderabad, India

I think I saw this problem. It has got to do something with reverse DNS lookup and that's the reason it hangs.

http://www.mqseries.net/phpBB2/viewtopic.php?t=18861&highlight=helped+recreate

I'll try to find the PMR number.
Back to top
View user's profile Send private message Send e-mail Visit poster's website
PeterPotkay
PostPosted: Fri Mar 25, 2005 2:10 pm    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

venella, if you can get that PMR, that would be great. But I kinda doubt it, because when this problem effects our Gateway QM, the channels to other QMs from this QM keep working, and a second cluster channel in another overlapping cluster also keeps working.

We got the lads in Hursley involved now as well. I bumped up my DISCINTs in the QA environment, so the channels stay running, as I suspect its a problem that occuers when the channels are either ready to go Inactive on their own, or are triggering back up. Dev I left as is, so the problem hopefully happens again to establish a pattern.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
jefflowrey
PostPosted: Fri Mar 25, 2005 2:27 pm    Post subject: Reply with quote

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

PeterPotkay wrote:
But I kinda doubt it, because when this problem effects our Gateway QM, the channels to other QMs from this QM keep working,


That sounds kinda like a reverse DNS lookup issue for a single IP address...?
_________________
I am *not* the model of the modern major general.
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Fri Mar 25, 2005 2:30 pm    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

Next time this bites me in the arse, b4 I reboot the machine, I will try and telnet from the problem box to the destination box.

But, note that I have 2 cluster senders going to the same destination, and only one gets a problem (and its not always the same one), so maybe not, UNLESS, it is a combo of a DNS problem at the same time a channel wants to start. Hmmmmm....
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Sat Mar 26, 2005 7:23 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

I've seen bizarre things happen with DNS
com from A to B no prob but from B to A works only if a link from A to B is active at the same time....

Enjoy
Back to top
View user's profile Send private message Send e-mail
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » General IBM MQ Support » Cluster Channels Hang
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.