Author |
Message
|
guy11 |
Posted: Wed May 16, 2012 10:05 pm Post subject: cluster resolution problem |
|
|
Newbie
Joined: 16 May 2012 Posts: 8
|
I getting frequent cluster resolution errors (2189) on my Full repository managers when application tries to access queues defined in Partial repository manager. I captured amqfrdm outputs in one of the repository managers. Up on inspections of the logs i found the below. Can somebody tell me is it normal or it denotes some problem. IBM Support is avoiding my question and not providing any answer, despite directly asking them
FQM1 (FR) , FMQ2 (FR) and PQM3 ( PR) were all members of 3 different cluster CLUS1, CLUS2, CLUS3 with dedicated cluster channels
Cluster queue QUEUE.CLUS1 is defined in PQM3
Below is part of amqrfdm output captured in FMQ1 (FR). Note QUEUE.CLUS1 is showing under all the 3 clusters of which PQM3 is also a member. QUEUE.CLUS1 has been in only cluster CLUS1 since ages.
MQ Version is 7.0.1.7, Platform SUN SPARC
Q(QUEUE.CLUS1 ) Seq(501)
@1708B8
Cluster(CLUS1 )
UUID(PQM3 )
SubID(135 2CF6DC89)
Exp(Fri May 25 03:05:30 2012) Upd(Wed Apr 25 03:05:31 2012)
Flags(No Ack ClusQ )
Flags(0) MsgId(414D5120514D5F4950535633202020204F5717572156E2D4)
EnumPrev(1709A8 ) EnumNext(107A7C0 )
Q(QUEUE.CLUS1 ) Seq(507)
@107A7C0
Cluster(CLUS2 )
UUID(PQM3 )
SubID(132 AEBD6E28)
Exp(Fri May 25 03:05:30 2012) Upd(Wed Apr 25 03:05:31 2012)
Flags(No Ack ClusQ )
Flags(1) MsgId(414D5120514D5F4950535633202020204F5717572156E2D3)
EnumPrev(1708B8 ) EnumNext(107A6D0 )
Q(QUEUE.CLUS1 ) Seq(507)
@107A6D0
Cluster(CLUS3 )
UUID(PQM3 )
SubID(132 AEBD6CA1)
Exp(Fri May 25 03:05:30 2012) Upd(Wed Apr 25 03:05:31 2012)
Flags(No Ack ClusQ )
Flags(1) MsgId(414D5120514D5F4950535633202020204F5717572156E2D2)
EnumPrev(107A7C0 ) EnumNext(107A5E0 ) |
|
Back to top |
|
 |
mqjeff |
Posted: Thu May 17, 2012 3:21 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
What is the exact text of the exact error that shows up?
Exactly which queue manager does it show up as?
Exactly which queue manager is the application connected to at the time of the error?
Exactly what object is the application attempting to use at the time of the error?
Exactly what MQ operation is the application attempting on that object at the time of the error?
Exactly what MQRC does the application receive from that attempt? |
|
Back to top |
|
 |
Vitor |
Posted: Thu May 17, 2012 5:38 am Post subject: Re: cluster resolution problem |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
guy11 wrote: |
IBM Support is avoiding my question and not providing any answer, despite directly asking them |
How are they avoiding the question? Do they just keep asking for more information?
Like the topology of your clusters, the number & status of channels, the circumstances of the error, that sort of thing? _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
guy11 |
Posted: Thu May 17, 2012 6:04 am Post subject: Re: cluster resolution problem |
|
|
Newbie
Joined: 16 May 2012 Posts: 8
|
Vitor wrote: |
guy11 wrote: |
IBM Support is avoiding my question and not providing any answer, despite directly asking them |
How are they avoiding the question? Do they just keep asking for more information?
Like the topology of your clusters, the number & status of channels, the circumstances of the error, that sort of thing? |
Exactly. We have provided topology, configuration details etc. always same standard answer, provide logs/traces when problem is occuring. |
|
Back to top |
|
 |
mqjeff |
Posted: Thu May 17, 2012 6:15 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
That's not avoiding the question.
That's telling you that there isn't yet enough information available to answer the question.
You haven't provided enough information here for anyone else to answer the question either.
The reason code indicates that the partial repository that the application is using is having difficulties communicating with the full repository in order to resolve the object being opened.
This could be because of a corrupt set of information in the full repository, although that's unlikely and would also have been exposed by the PMR by now.
It's more likely that there are channel issues with the clusrcvr or clussdr on the PR.
It's also likely that you have not constructed the topology that you think you have constructed. You have said you expect the queue is only shared in one of the three clusters, but you are seeing definitions for it in the FR that indicate it's shared in all three. This is most likely caused by a misunderstanding of what you have configured - that is, that you have *actually* shared it in all three clusters even though you think you only shared it in one.
But there isn't enough information here to determine which of these, if any, is actually the scenario in effect. |
|
Back to top |
|
 |
guy11 |
Posted: Thu May 17, 2012 6:18 am Post subject: |
|
|
Newbie
Joined: 16 May 2012 Posts: 8
|
mqjeff wrote: |
What is the exact text of the exact error that shows up?
Exactly which queue manager does it show up as?
Exactly which queue manager is the application connected to at the time of the error?
Exactly what object is the application attempting to use at the time of the error?
Exactly what MQ operation is the application attempting on that object at the time of the error?
Exactly what MQRC does the application receive from that attempt? |
Application accessing the queue through JMS gets 2189 ( Runs in same machine with Full repository manager )
Error always happens at FQM1 or FMQ2 ( Full repositories ) and Queue being accessed was in PQM3 ( Partial repository )
Object being accessed by application was CLUSTER QUEUE on Partial repository ( QUEUE.CLUS1 )
Application is attempting MQPUT/MQPUT1 through JMS
Error received by application is MQJMS001: Completion Code 2, Reason 2189.
This is not new setup, has been working since ages, problem resolves once i refresh the cluster. But it happening again for a different queue in different cluster. But rest is always the same about Full and partial repositories. I am quite familiar with MQ and Clusters. This is something i couldn't sort it out.
In the output, i posted, the cluster queue part of one cluster is appearing under 3 different clusters in amqrfdm output, does it denotes corruption in repository cache or is it normal. as i stated the partial repository which hosts the stated queue is also member of other 2 clusters the queue is showing up in cache. |
|
Back to top |
|
 |
Vitor |
Posted: Thu May 17, 2012 6:23 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
mqjeff wrote: |
That's not avoiding the question.
That's telling you that there isn't yet enough information available to answer the question. |
mqjeff wrote: |
You haven't provided enough information here for anyone else to answer the question either. |
mqjeff wrote: |
It's also likely that you have not constructed the topology that you think you have constructed |
mqjeff wrote: |
But there isn't enough information here to determine which of these, if any, is actually the scenario in effect. |
 _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
guy11 |
Posted: Thu May 17, 2012 6:38 am Post subject: |
|
|
Newbie
Joined: 16 May 2012 Posts: 8
|
mqjeff wrote: |
That's not avoiding the question.
That's telling you that there isn't yet enough information available to answer the question.
You haven't provided enough information here for anyone else to answer the question either.
The reason code indicates that the partial repository that the application is using is having difficulties communicating with the full repository in order to resolve the object being opened.
This could be because of a corrupt set of information in the full repository, although that's unlikely and would also have been exposed by the PMR by now.
It's more likely that there are channel issues with the clusrcvr or clussdr on the PR.
It's also likely that you have not constructed the topology that you think you have constructed. You have said you expect the queue is only shared in one of the three clusters, but you are seeing definitions for it in the FR that indicate it's shared in all three. This is most likely caused by a misunderstanding of what you have configured - that is, that you have *actually* shared it in all three clusters even though you think you only shared it in one.
But there isn't enough information here to determine which of these, if any, is actually the scenario in effect. |
I am damn sure the queue was shared in only one cluster.
This is not new configuration, it was working for ages.
Trouble started after we migrated to V7 from V6.
We have 15+ clusters and problem happened for different queues in different clusters, even different Partial repository QMs in different platforms - zOS, AIX, SOLARIS ).
My question is simple, does it denotes corruption ?. PMR has not exposed anything, they are not giving any answer. |
|
Back to top |
|
 |
cicsprog |
Posted: Thu May 17, 2012 7:51 am Post subject: |
|
|
Partisan
Joined: 27 Jan 2002 Posts: 347
|
Are you scripting ALTER or DEFINE commands as the server or MQM starts. I've seen newbie MQ Admins DEFINE cluster objects every time the server is recycled. This can't be done - there is state data that needs to be maintained, |
|
Back to top |
|
 |
bruce2359 |
Posted: Thu May 17, 2012 8:32 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
I sense that the OP has (mis)used the REFRESH and/or RESET CLUSTER commands... oooooohhhhhhhhhmmmmmmm. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
cicsprog |
Posted: Thu May 17, 2012 8:53 am Post subject: |
|
|
Partisan
Joined: 27 Jan 2002 Posts: 347
|
+1
You may want to do a DIS CLUSQMGR(*) CLUSTER(<cluster name>) to see if the QMID and CLUSTER date and times match match between MQM's. If they don't you have corrupt repositories. |
|
Back to top |
|
 |
exerk |
Posted: Thu May 17, 2012 10:34 am Post subject: |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
When you migrated your queue managers did you ensure you migrated your FRs first? _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
mvic |
Posted: Thu May 17, 2012 11:26 am Post subject: Re: cluster resolution problem |
|
|
 Jedi
Joined: 09 Mar 2004 Posts: 2080
|
guy11 wrote: |
Below is part of amqrfdm output captured in FMQ1 (FR). Note QUEUE.CLUS1 is showing under all the 3 clusters of which PQM3 is also a member. |
What you have dumped there is subscription information held on a FR in respect of an interest registered by a PR. It therefore does not mean what you took it to mean, unfortunately.
Continue working with IBM. Cluster problems can take weeks to work out.
My general idea is: check the health of all your channels, and check that all channels are defined as you expect them to be.
What fix pack are you running on your various qmgrs?
I hope you've already checked through all the MQ v7 APARs that say "queue went missing" or that sort of thing. Ensure you're at latest maintenance, as a first step.
Hmm. I normally go to http://www.ibm.com/support/docview.wss?uid=swg21254675 for a list of fixes but the page is not there right now. Does anyone have a better link? |
|
Back to top |
|
 |
guy11 |
Posted: Thu May 17, 2012 10:10 pm Post subject: |
|
|
Newbie
Joined: 16 May 2012 Posts: 8
|
cicsprog wrote: |
+1
You may want to do a DIS CLUSQMGR(*) CLUSTER(<cluster name>) to see if the QMID and CLUSTER date and times match match between MQM's. If they don't you have corrupt repositories. |
Thanks. QMID matches, but the CLUSDATE and CLUSTIME doesn't match for some of the clusters, so those cluster caches are corrupted is it ?. |
|
Back to top |
|
 |
mvic |
Posted: Fri May 18, 2012 2:13 am Post subject: |
|
|
 Jedi
Joined: 09 Mar 2004 Posts: 2080
|
guy11 wrote: |
QMID matches, but the CLUSDATE and CLUSTIME doesn't match for some of the clusters, so those cluster caches are corrupted is it ?. |
Not likely, but possibly you have a breakdown in communications that mean the updates are not being pushed through the system. |
|
Back to top |
|
 |
|