Author |
Message
|
shlomoe |
Posted: Sun Apr 27, 2008 4:46 am Post subject: mq cluster channels disconnecting issue |
|
|
 Newbie
Joined: 11 Mar 2003 Posts: 7
|
env:
aix 5300-05-00
mq 6.0.2.2
mq cluster channels disconnecting issue
We have a test cluster with three qmgrs.
Two full repositories: TSTFW02,TSTGEN01
and a single partial repository: TSTFW01.
The cluster channels between all the members of the cluster are disconnecting from
one another.
This happens for example when we define a new cluster queue
or due to the heartbeat message being sent by the infrastructure
since our cluster is not heavily used.
At first we suspected communications/firewall issues
so we defined non cluster sdr - rcvr channels and this works smoothly.
here are messages from one of the error logs when disconnect occurs:
(we see the same error messages in the other qmgrs)
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
27/04/08 14:03:36 - Process(266244.28413) User(mbadmin) Program(amqrmppa)
AMQ9209: Connection to host 'tfwbrk02 (xx.xx.xx.xx)' closed.
EXPLANATION:
An error occurred receiving data from 'tfwbrk02 (xx.xx.xx.xx)' over TCP/IP.
The connection to the remote host has unexpectedly terminated.
ACTION:
Tell the systems administrator.
----- amqccita.c : 3094 -------------------------------------------------------
27/04/08 14:03:36 - Process(266244.28413) User(mbadmin) Program(amqrmppa)
AMQ9999: Channel program ended abnormally.
EXPLANATION:
Channel program 'TO.TSTFW01' ended abnormally.
ACTION:
Look at previous error messages for channel program 'TO.TSTFW01' in the error
files to determine the cause of the failure.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
we also see the following messages though we didn't issue any stop chl command:
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
04/27/08 14:36:18 - Process(372966.17810) User(mqm) Program(amqrmppa)
AMQ9528: User requested closure of channel 'TO.TSTGEN01'.
EXPLANATION:
The channel is closing because of a request by the user.
ACTION:
None.
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
attached are clussdr, clusrcvr definitions:
DISPLAY CHANNEL(TO.*) ALL
7 : DISPLAY CHANNEL(TO.*) ALL
AMQ8414: Display Channel details.
CHANNEL(TO.TSTFW02) CHLTYPE(CLUSRCVR)
ALTDATE(2008-02-12) ALTTIME(17.25.12)
BATCHHB(0) BATCHINT(0)
BATCHSZ(50) CLUSNL( )
CLUSTER(TSTESB00) CLWLPRTY(0)
CLWLRANK(0) CLWLWGHT(50)
COMPHDR(NONE) COMPMSG(NONE)
CONNAME(xx.xx.xx.xx) CONVERT(NO)
DESCR( ) DISCINT(6000)
HBINT(300) KAINT(AUTO)
LOCLADDR( ) LONGRTY(999999999)
LONGTMR(1200) MAXMSGL(4194304)
MCANAME( ) MCATYPE(THREAD)
MCAUSER( ) MODENAME( )
MONCHL(QMGR) MRDATA( )
MREXIT( ) MRRTY(10)
MRTMR(1000) MSGDATA( )
MSGEXIT( ) NETPRTY(0)
NPMSPEED(FAST) PUTAUT(DEF)
RCVDATA( ) RCVEXIT( )
SCYDATA( ) SCYEXIT( )
SENDDATA( ) SENDEXIT( )
SEQWRAP(999999999) SHORTRTY(10)
SHORTTMR(60) SSLCAUTH(REQUIRED)
SSLCIPH( ) SSLPEER( )
STATCHL(QMGR) TPNAME( )
TRPTYPE(TCP)
AMQ8414: Display Channel details.
CHANNEL(TO.TSTGEN01) CHLTYPE(CLUSSDR)
ALTDATE(2008-02-12) ALTTIME(17.25.12)
BATCHHB(0) BATCHINT(0)
BATCHSZ(50) CLUSNL( )
CLUSTER(TSTESB00) CLWLPRTY(0)
CLWLRANK(0) CLWLWGHT(50)
COMPHDR(NONE) COMPMSG(NONE)
CONNAME(xx.xx.xx.xx) CONVERT(NO)
DESCR( ) DISCINT(6000)
HBINT(300) KAINT(AUTO)
LOCLADDR( ) LONGRTY(999999999)
LONGTMR(1200) MAXMSGL(4194304)
MCANAME( ) MCATYPE(THREAD)
MCAUSER( ) MODENAME( )
MONCHL(QMGR) MSGDATA( )
MSGEXIT( ) NPMSPEED(FAST)
PASSWORD( ) RCVDATA( )
RCVEXIT( ) SCYDATA( )
SCYEXIT( ) SENDDATA( )
SENDEXIT( ) SEQWRAP(999999999)
SHORTRTY(10) SHORTTMR(60)
SSLCIPH( ) SSLPEER( )
STATCHL(QMGR) TPNAME( )
TRPTYPE(TCP) USERID( )
>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
Any help will be appriciated. |
|
Back to top |
|
 |
jefflowrey |
Posted: Sun Apr 27, 2008 6:18 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
Someone is issuing STOP CHANNEL, almost certainly. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
Vitor |
Posted: Sun Apr 27, 2008 10:19 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
jefflowrey wrote: |
Someone is issuing STOP CHANNEL, almost certainly. |
Or something - monitoring software detecting "unused" channel _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Sun Apr 27, 2008 5:18 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
I get these stop channel errors every few days on one of my SNDR/RCVR channels. I cannot figure out what is causing it. No firewall involved. No person is issuing stop channel commands. It happens randomly, although almost always between 1 and 5 AM.
Because the MQ channel restarts itself almost immediately (within a second), it happens in the middle of the night and the transactions going over this channel are not time sensitive its a low priority if not annoying problem.
I'm pretty sure its the network connection having something to do with it, but can't prove it. Yet. I once had these types of errors constantly on another QA Queue Manager anytime the business application was stress testing. The network guys said something at their layer was being "flooded". Once they fixed that the problem of the channel stopping with these errors went away on that QM.
shlomoe if you get a resolution from somewhere other than this thread please post it. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
dp111443 |
Posted: Fri Sep 05, 2008 5:47 am Post subject: |
|
|
Voyager
Joined: 25 Feb 2004 Posts: 82
|
Hi Peter,
I too have been recieving the same error, however in our case, on a server connection channel.
I've seen hundreds of client connections on the channel, but when the channel goes into an 'inactive' state, all the connections disappear and then reconnect. This is when I see the AMQ8417 errors in the MQ error logs.
In my case though, we are issuing a stop channel (to inactive) state so it's easily explainable, but in your case I wonder if it could be that due to a network issue, it causes the channel to go inactive which then causes connections reconnect?
Just a thought... _________________ Integration Design/Developer
IBM Certified System Administrator -
WebSphere MQ V5.3 |
|
Back to top |
|
 |
Mr Butcher |
Posted: Sun Sep 07, 2008 10:04 pm Post subject: |
|
|
 Padawan
Joined: 23 May 2005 Posts: 1716
|
Hi Peter,
i had a similiar situation, but its some time ago. im my case, the channel heartbeat was not negotiated as expected between sender and receiver, and - in time of channel inactivity, the receiver closes the channel because his heartbeat value was smaller then the senders one.
there was a fix for this.
maybe its worth checking these heartbeat parameters....... _________________ Regards, Butcher |
|
Back to top |
|
 |
PeterPotkay |
Posted: Mon Sep 08, 2008 4:31 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Thanks for the tip Butcher. The Hearbeat Interval is set to 30 on all my channels. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
|