Author |
Message
|
jeevan |
Posted: Sun Nov 22, 2009 2:55 pm Post subject: Cluster Resolution /Reply to message delaying Problem |
|
|
Grand Master
Joined: 12 Nov 2005 Posts: 1432
|
Our MQ set up consists of two overlapping clusters. There are two bridge qmgrs which belong to both cluster and through which the two clusters communicate.
Once (according to my colleague), he saw about 10,000 messages on cluster transmission queue in full repositories and cluster channels were stopped. When he started the channel, it again went to stopped mode. But when he bouced the bridge qmgr, all the message were cleared.
There are problem of delaying message from backend queue managers ( which belong to one of the two clusters) to store server( the member of the other cluster)
While replying, the reply to queue manager is the qmgr alias defined in the bridge queue manager. Each of these queue managers have full list of qmgr alias.
We are realising is a long time to reach the reply to message back to store servers however messages are reaching to request queue without delay. The request queue are the qa at bridge qmgr and the physical queue in the backend queue manager.
I am suspecting that probably there are some bad and duplicate records in full repositories about partial repositories.
One thing uncommon in our cluster setup is that, all partial repositories are connected to both FRs. That means, there are two cluster sender channel in each of the Partial repositories.
Any thought,
Thanks
Last edited by jeevan on Sun Dec 06, 2009 2:47 pm; edited 2 times in total |
|
Back to top |
|
 |
exerk |
Posted: Sun Nov 22, 2009 4:06 pm Post subject: Re: Cluster Resolution /Reply to message delaying Problem |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
jeevan wrote: |
...One thing uncommon in our cluster setup is that, all partial repositories are connected to both FRs... |
Are we to assume from that statement that manually defined CLUSSDR's are used to connect? _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
jeevan |
Posted: Sun Nov 22, 2009 4:24 pm Post subject: Re: Cluster Resolution /Reply to message delaying Problem |
|
|
Grand Master
Joined: 12 Nov 2005 Posts: 1432
|
exerk wrote: |
jeevan wrote: |
...One thing uncommon in our cluster setup is that, all partial repositories are connected to both FRs... |
Are we to assume from that statement that manually defined CLUSSDR's are used to connect? |
Yes, each partial repositories have 3 cluster channels, one cluster receiver channel and two cluster sender channels to two FRs.
Last edited by jeevan on Mon Dec 07, 2009 6:16 am; edited 2 times in total |
|
Back to top |
|
 |
bruce2359 |
Posted: Sun Nov 22, 2009 4:26 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Quote: |
...cluster channels were stopped. |
A channel in STOPPED state was either stopped manually (by someone or something issuing a STOP CHANNEL command, for example), OR the channel encountered errors from which it could not recover.
Did you take a look at the error logs for the failing channel(s) to see why they failed? Please do so, and post the results here. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
jeevan |
Posted: Sun Nov 22, 2009 4:36 pm Post subject: |
|
|
Grand Master
Joined: 12 Nov 2005 Posts: 1432
|
bruce2359 wrote: |
Quote: |
...cluster channels were stopped. |
A channel in STOPPED state was either stopped manually (by someone or something issuing a STOP CHANNEL command, for example), OR the channel encountered errors from which it could not recover.
Did you take a look at the error logs for the failing channel(s) to see why they failed? Please do so, and post the results here. |
All these are the past history. I have been asked to figure out the problems and was explained to me that is what happned once in the past. So I do not have any error log of these.
But when the channel were restarted, they went to stopped mode but they came up ok once the bridge qmgrs were bounced. So, i am suspecting some problem in repositories.
I assume, the channel went down because they could not delive the message. I think in that situation, mq stopps the channel( ready to be corrected)
Last edited by jeevan on Mon Nov 23, 2009 6:21 am; edited 1 time in total |
|
Back to top |
|
 |
bruce2359 |
Posted: Sun Nov 22, 2009 7:42 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Quote: |
I assume, the channel went down because they could not deliver the message. I think in that situation, mq stops the channel (ready to be corrected) |
You have no error logs and no other definitive documentation of the error and its symptoms - other than someone said this happened?
There are a variety of reasons for the symptoms reported to you; but without any trail (errors logged), there is little chance of intelligently taking corrective action. We could guess, I suppose.
There is a shotgun approach to problem resolution. Let's say you are told to change a few things, then wait to see if it happen again. If it problem does not recur, which thing fixed the problem? What did you learn? If it didn't fix the problem, what then?
As a side note, I've recommended to clients that they archive error logs at qmgr startup. This allows for this type of historical analysis. It's only disk space, or tape. This action alone may satisfy management that you are doing the right thing to best track down the problem if it should happen again. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
fjb_saper |
Posted: Sun Nov 22, 2009 9:15 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Can you please specify the version of MQ and whether the cluster channels are using message compression?  _________________ MQ & Broker admin |
|
Back to top |
|
 |
jeevan |
Posted: Mon Nov 23, 2009 3:49 am Post subject: |
|
|
Grand Master
Joined: 12 Nov 2005 Posts: 1432
|
fjb_saper wrote: |
Can you please specify the version of MQ and whether the cluster channels are using message compression?  |
MQ 6.0.2.7 for Aix
No, we are not using compression but all channel including cluster channel are ssl enabled.
Last edited by jeevan on Mon Nov 23, 2009 5:48 am; edited 2 times in total |
|
Back to top |
|
 |
PeterPotkay |
Posted: Mon Nov 23, 2009 4:55 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
fjb_saper wrote: |
Can you please specify the version of MQ and whether the cluster channels are using message compression?  |
_________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
fjb_saper |
Posted: Mon Nov 23, 2009 3:15 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
PeterPotkay wrote: |
fjb_saper wrote: |
Can you please specify the version of MQ and whether the cluster channels are using message compression?  |
|
There is an APAR / PMR for this. The compression (depending on size of message) has a problem with alignments... IBM has a fix for this (V6.0.2.6 & V6.0.2.7)
@jeevan
You may have the same type of problem with SSL. Open a PRM!  _________________ MQ & Broker admin |
|
Back to top |
|
 |
jeevan |
Posted: Sun Dec 06, 2009 2:41 pm Post subject: |
|
|
Grand Master
Joined: 12 Nov 2005 Posts: 1432
|
fjb_saper wrote: |
PeterPotkay wrote: |
fjb_saper wrote: |
Can you please specify the version of MQ and whether the cluster channels are using message compression?  |
|
There is an APAR / PMR for this. The compression (depending on size of message) has a problem with alignments... IBM has a fix for this (V6.0.2.6 & V6.0.2.7)
@jeevan
You may have the same type of problem with SSL. Open a PRM!  |
We found the solution of the message delaying. Let me explain it.
As I said above, our mq network consists of two overlapping clusters. Two queue manager function as a gateway queue managers which are the member of both clusters. The gateway queue manager hold alias queue for, lets say, client side of the cluster and qmgr alias for server side of the cluster. When the server side of the qmgr replies the message, it may go to either of the bridge queue manager.
The bridge queue manager is not sending the message to client qmgr ( reply to qmgr) directly each time. one time it sends directly, the other time, it sends the message to other bridge queue manager. So the message does to and fro a few times between the bridge queue managers. This was causing the delay and sometime expiring the messages.
Once we created manual cluster sender channel betewen these bridge queue managers and stop them, the message start reaching the target as expected.
We also had a problem of dying the amqrrmfa (reepository) process when we stop and start the queue manager. We opened a PMR to ibm. they sent us replacement of the amqrrmfa and that seems working.
By the way, do any of you guys have overlapping cluster? Had you faced the similar situation like we did ?
Thanks |
|
Back to top |
|
 |
exerk |
Posted: Sun Dec 06, 2009 3:08 pm Post subject: |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
jeevan wrote: |
...By the way, do any of you guys have overlapping cluster? Had you faced the similar situation like we did ?... |
Not when a cluster has been correctly configured, e.g. as per the manual and with only one manually defined CLUSSDR from a PR to an FR  _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
zonko |
Posted: Sun Dec 06, 2009 11:48 pm Post subject: |
|
|
Voyager
Joined: 04 Nov 2009 Posts: 78
|
You chose the wrong solution, creating a manual CLUSSDR.
The cause was that the qmgr alias had the same name as the actual qmgr, and since each had the same status as valid destinations, the msg was load-balanced between the qmgr hosting the alias, and the actual qmgr.
The real solution was to create qmgr aliases with different names to the real qmgr. |
|
Back to top |
|
 |
jeevan |
Posted: Mon Dec 07, 2009 6:01 am Post subject: |
|
|
Grand Master
Joined: 12 Nov 2005 Posts: 1432
|
zonko wrote: |
You chose the wrong solution, creating a manual CLUSSDR.
The cause was that the qmgr alias had the same name as the actual qmgr, and since each had the same status as valid destinations, the msg was load-balanced between the qmgr hosting the alias, and the actual qmgr.
The real solution was to create qmgr aliases with different names to the real qmgr. |
For a discussion, let's say we have the follwoing systems.
8 qmgr in cluster client
2 are bridge server
2 FR
4 qmgrs where client applicaton connect and put message and wait for response
6 qmgr in server cluster
2 FR ( same as above)
2 bridge server ( same as above)
2 qmgrs for server ( responding) app where the responding applicaton connect and reply the message
Note: Bridge server belong to both cluster and FR are the same for both clusters
each of the two bridge queue managers hold alias queues for the cluster queue physically resided
in 2 qmgrs where the responding queue managerconnect and reply the messages received from client app
and qmgr alias for 4 queue manager where the client applicaiton connect and put the messages
when the client app sends a message, the reply to queue manager is picked up automatically. The replytoqueu
may be set as this is the same thoughtout all the queue manager.
But according to you, if we have different name for qmgr alias then the actual name, and again, one in one
bridge queue manager and another name in another bridge queue manager, would not it be mess?
A client app which connect to 4 client side queue managers, have to have diffrent replyToQmgr.
Also, if we have same alias name in both bridge queue manager same problem arises as we faced now. If
we have different alias name for the same queue manger, then, it would create more hassle for
applicaiton to put the reply to qmgr.
Could you please explain how it would work without hassle for appplciation? |
|
Back to top |
|
 |
|