Author |
Message
|
Sam Uppu |
Posted: Mon Mar 01, 2010 8:49 pm Post subject: CLUSSDR channel goes into retry state after an hour(approx.) |
|
|
 Yatiri
Joined: 11 Nov 2008 Posts: 610
|
Hi Guys,
We are on MQ 7.0.1 on running on solaris and AIX boxes.
We have 2 queue managers(QM1, QM2) in a cluster..one(QM1) running on AIX and the other(QM2) on Solaris box.
When both queue managers are configured in cluster, the cluster sender/receiver channels are up and running but around an hour later the sender on QM1 pointing to QM2(i.e., TO.QM2) goes into retry state and later to Binding state. When we rebuild the queue manager QM1 with the same config details, the cluster sender/receiver channels will be up and running for an hour and the sender channel goes into retry state.
When I do a DIS CHS(TO.*) on QM1:
dis chs(*)
1 : dis chs(*)
AMQ8417: Display Channel Status details.
CHANNEL(TO.QM2) CHLTYPE(CLUSSDR)
CONNAME(xx.xxx.xx.xxx(1414)) CURRENT
RQMNAME( ) STATUS(RETRYING)
SUBSTATE( ) XMITQ(SYSTEM.CLUSTER.TRANSMIT.QUEUE)
AMQ8417: Display Channel Status details.
CHANNEL(TO.QM1) CHLTYPE(CLUSRCVR)
CONNAME(yy.yyy.yy.yyy) CURRENT
RQMNAME(QM1) STATUS(RUNNING)
SUBSTATE(RECEIVE) XMITQ( )
where as the cluster sender / receiver channels on QMQ2 are up and running all the time.
There is an FDC on QM1 saying BAD_DATA_RECEIVED from QM2. Not sure whether this is related to the issue.
Can you guys show some light on this.
Thanks. |
|
Back to top |
|
 |
Vitor |
Posted: Mon Mar 01, 2010 9:04 pm Post subject: Re: CLUSSDR channel goes into retry state after an hour(appr |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
Sam Uppu wrote: |
When we rebuild the queue manager QM1 with the same config details |
I do hope that's not exactly what you mean, and what you are actually doing is removing/ejecting QM1 from the cluster then recreating it and readding it.
Sam Uppu wrote: |
There is an FDC on QM1 saying BAD_DATA_RECEIVED from QM2. Not sure whether this is related to the issue. |
If you're not playing fast & loose with the cluster (and it does sound like the queue manager is getting replication information it's not expecting) then it's PMR time.
Unless someone wants to correct my belief default replication is 60 mins or so....? _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
Mr Butcher |
Posted: Tue Mar 02, 2010 12:59 am Post subject: |
|
|
 Padawan
Joined: 23 May 2005 Posts: 1716
|
maybe explizit defined clussdr uses a correct conname
maybe implizit defined clussdr uses a bad conname
maybe disconnect intervall is 1 hour.
after setting up the cluster, the explizit defined clussdr with the correct conname is working. then disconnectinterval makes channel to become inactive. then it is used, and started using the implizit defined channel definition with the bad conname, goes in to retry and binding.
just a guess. other channel attributes could come into account too, e.g. defined exits on QM1 which do not exist on QM2 and so on. i'd also check the amqerr* log files. _________________ Regards, Butcher |
|
Back to top |
|
 |
Sam Uppu |
Posted: Tue Mar 02, 2010 10:01 am Post subject: |
|
|
 Yatiri
Joined: 11 Nov 2008 Posts: 610
|
Sorry..this has been resolved. .The issue was we used a different ip(the IP which it is resolving to after a period of time) in the cluster receiver channel on the destination queue manager.
I should have double checked the cluster receiver channel. My apolozies for not checking this earlier.
One question though:
How come the cluster sender/receiver channels were able to start when I provided wrong ip in the cluster receiver channel of the destination queue manager?. I was able to send msgs across in both directions and communication was ok for an hour and later the sender channel of the source qmgr going to retry mode which caused me to believe there is something changing inflight. How come the sender/ receiver pair will work at the initial 1 hour even I provide a wrong ip in the cluster receiver channel of destination qmgr?. |
|
Back to top |
|
 |
bruce2359 |
Posted: Tue Mar 02, 2010 10:04 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9472 Location: US: west coast, almost. Otherwise, enroute.
|
Moved to Clustering forum. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
fjb_saper |
Posted: Tue Mar 02, 2010 9:40 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Sam Uppu wrote: |
Sorry..this has been resolved. .The issue was we used a different ip(the IP which it is resolving to after a period of time) in the cluster receiver channel on the destination queue manager.
I should have double checked the cluster receiver channel. My apolozies for not checking this earlier.
One question though:
How come the cluster sender/receiver channels were able to start when I provided wrong ip in the cluster receiver channel of the destination queue manager?. I was able to send msgs across in both directions and communication was ok for an hour and later the sender channel of the source qmgr going to retry mode which caused me to believe there is something changing inflight. How come the sender/ receiver pair will work at the initial 1 hour even I provide a wrong ip in the cluster receiver channel of destination qmgr?. |
Depends... how did you change the channel?
- stop cluster receiver (mode force status stopped)
- remove cluster info from cluster receiver
- change ip/dns name on cluster receiver
- make sure the cluster receiver displays the right ip/dns name using chl display
I'd prefer dns name because it allows network traversal etc...
- change channel adding cluster information
- verify the change took on at least the FR's and one PR
- done
Of course you want the qmgr suspended from the cluster while you do this...
Have fun  _________________ MQ & Broker admin |
|
Back to top |
|
 |
Mr Butcher |
Posted: Wed Mar 03, 2010 12:37 am Post subject: |
|
|
 Padawan
Joined: 23 May 2005 Posts: 1716
|
Quote: |
How come the cluster sender/receiver channels were able to start when I provided wrong ip in the cluster receiver channel of the destination queue manager?. I was able to send msgs across in both directions and communication was ok for an hour and later the sender channel of the source qmgr going to retry mode which caused me to believe there is something changing inflight. How come the sender/ receiver pair will work at the initial 1 hour even I provide a wrong ip in the cluster receiver channel of destination qmgr?. |
as i wrote before... for the inital contact your explizit cluster sender channel (with the correct ip) is used. the cluster receiver definition(with the wrong ip) is then received and used to create the implizit defined cluster sender channel. next time the channel starts, this implizit defined channel with the wrong ip is used, and you encountered a non working connection that was working before.
this could either happen by manual stop start, or just by channels going inactive because of disconnect intervall, or other channel disruptions.... _________________ Regards, Butcher |
|
Back to top |
|
 |
Sam Uppu |
Posted: Wed Mar 03, 2010 7:42 am Post subject: |
|
|
 Yatiri
Joined: 11 Nov 2008 Posts: 610
|
Mr Butcher wrote: |
Quote: |
How come the cluster sender/receiver channels were able to start when I provided wrong ip in the cluster receiver channel of the destination queue manager?. I was able to send msgs across in both directions and communication was ok for an hour and later the sender channel of the source qmgr going to retry mode which caused me to believe there is something changing inflight. How come the sender/ receiver pair will work at the initial 1 hour even I provide a wrong ip in the cluster receiver channel of destination qmgr?. |
as i wrote before... for the inital contact your explizit cluster sender channel (with the correct ip) is used. the cluster receiver definition(with the wrong ip) is then received and used to create the implizit defined cluster sender channel. next time the channel starts, this implizit defined channel with the wrong ip is used, and you encountered a non working connection that was working before.
this could either happen by manual stop start, or just by channels going inactive because of disconnect intervall, or other channel disruptions.... |
Mr. Butcher,
This is what exactly happened.
It was a blunder on my end but as both cluster sender/ receiver channels were up and running which made me think - "I did define properly" but I am not.
Thanks for sharing the info. |
|
Back to top |
|
 |
|