ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum IndexClusteringMother of All Cluster Problems

Post new topicReply to topic Goto page 1, 2  Next
Mother of All Cluster Problems View previous topic :: View next topic
Author Message
belchman
PostPosted: Thu May 17, 2018 7:11 am Post subject: Mother of All Cluster Problems Reply with quote

Partisan

Joined: 31 Mar 2006
Posts: 386
Location: Ohio, USA

I am hoping I can get some ideas from you folks on this one. I opened a ticket with IBM on this and their initial response was "a gasp" (it seemed) when I described the issue.

I have this situation where I get an AMQ9469 after a while. It says "AMQ9469: Update not received for CLUSRCVR channel TO.MQP1 hosted on queue manager MQP1.C5D30588EE85EF01 in cluster MYCLUSTER."

This occurs with predictable regularity and I have to issue a REFRESH cluster on this queue manager (AIX, not the z/OS one behind the cluster receiver) to make it stop.

If I do not refresh, some cluster queues on this AIX queue manager disappear and an outage ensues in which I have to refresh the cluster to restore service.

I have no idea how or why this is occurring and am looking for help.

Here is why IBM gasped when I opened the PMR. This one queue manager is a full repos for 8 different clusters.

Aside from that, the cluster that I need to refresh with regularity has
2 fulls and 1 partial. The 2 fulls are an AIX qmgr and a z/OS qmgr (MQP1). The local queues that are shared are in the cluster are on MQP1.

I do not admin the z/OS qmgr so I have to ask the admin to look up stuff. All I can see is what is on AIX.

I need to figure out why MQP1 is not sending updates about its receiver or queues. When this AIX qmgr gets this error report, it is a countdown to the outage.

AMQ9456: Update not received for queue MYCLUSTERQUEUE, queue manager MQP1.C5D30588EE85EF01 from full repository for cluster MYCLUSTER.

EXPLANATION:
The repository manager detected a cluster queue that had been used sometime in the last 30 days for which updated information should have been sent from afull repository. However, this has not occurred.

The repository manager will keep the information about this queue for a further 60 days from when the error first occurred.


Any ideas you may have would be appreciated.
_________________
Make three correct guesses consecutively and you will establish a reputation as an expert. ~ Laurence J. Peter
Back to top
View user's profile Send private message
Anant.v
PostPosted: Thu May 17, 2018 7:20 am Post subject: Reply with quote

Apprentice

Joined: 26 Nov 2014
Posts: 40
Location: Malaysia

I'm facing some similar issues in my environment. What i have come to a conclusion is, its happening in my case, only after a DR simulation. Is it happening for you also after a DR ?
Back to top
View user's profile Send private message
belchman
PostPosted: Thu May 17, 2018 10:32 am Post subject: Reply with quote

Partisan

Joined: 31 Mar 2006
Posts: 386
Location: Ohio, USA

Anant,

My situation is not related to a DR exercise or anything. It has been going on in production for over a year.
_________________
Make three correct guesses consecutively and you will establish a reputation as an expert. ~ Laurence J. Peter
Back to top
View user's profile Send private message
gbaddeley
PostPosted: Thu May 17, 2018 4:12 pm Post subject: Reply with quote

Jedi

Joined: 25 Mar 2003
Posts: 2492
Location: Melbourne, Australia

AFAIK, a PR qmgr will choose to send cluster object updates to one of the FR qmgrs that it knows about. If there is an error that prevents updates from being processed by that FR, the PR will NOT choose another FR to send updates.

Check the MQ error logs and check for FDCs on the PR and FR qmgrs.

If a PR cannot do this processing for ~90 days, it will silently delete all the cluster queue defs in its local repository. Apps will then fail with RC 2189 (cluster resolution error).
_________________
Glenn
Back to top
View user's profile Send private message
belchman
PostPosted: Fri May 18, 2018 3:39 am Post subject: Reply with quote

Partisan

Joined: 31 Mar 2006
Posts: 386
Location: Ohio, USA

gbaddeley,

That's why I am confused. The queues and the clusrcvr that are shared (and that go away)are shared on a FR and they become unavailable to another FR.

And the fact that when I do a manual refresh of the cluster, they come back tells me that these FR are able to communicate.

I am truly stumped. I opened another ESR with IBM .

Is it possible that the z/OS MQ code is experiencing that repository manager bug IT12700?
_________________
Make three correct guesses consecutively and you will establish a reputation as an expert. ~ Laurence J. Peter
Back to top
View user's profile Send private message
tczielke
PostPosted: Fri May 18, 2018 4:46 am Post subject: Reply with quote

Guardian

Joined: 08 Jul 2010
Posts: 939
Location: Illinois, USA

It sounded like one FR is on z/OS and the other FR is on distributed AIX. Are they at the same code level? Personally, I would run the FRs on the same platform and IBM MQ code level.
_________________
Working with MQ since 2010.
Back to top
View user's profile Send private message
belchman
PostPosted: Fri May 18, 2018 4:57 am Post subject: Reply with quote

Partisan

Joined: 31 Mar 2006
Posts: 386
Location: Ohio, USA

tczielke,

There are a number of things I would have done differently but I inherited this stuff and it is a challenge.

The mainframe queue manager is a FP (I believe) so the mainframe MQ admin can see the cluster. We have a separation between mainframe and open systems MQ that is another challenge.

I will reach out to the mainframe MQ person to see what version of MQ is installed there. I am not sure if IT12700 would affect the z/os MQ the way it did AIX MQ.
_________________
Make three correct guesses consecutively and you will establish a reputation as an expert. ~ Laurence J. Peter
Back to top
View user's profile Send private message
bruce2359
PostPosted: Fri May 18, 2018 5:46 am Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9394
Location: US: west coast, almost. Otherwise, enroute.

belchman wrote:
The mainframe queue manager is a FP (I believe) ...


You “believe?” MQSC DISPLAY commands will tell you which are PRs and which are FRs.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
belchman
PostPosted: Fri May 18, 2018 5:49 am Post subject: Reply with quote

Partisan

Joined: 31 Mar 2006
Posts: 386
Location: Ohio, USA

bruce2359,

Communication error.

1) I know MQP1 is a full repository
2) I believe it is was made a full repos so that the z/OS admin can see the full cluster

Sorry for confusion
_________________
Make three correct guesses consecutively and you will establish a reputation as an expert. ~ Laurence J. Peter
Back to top
View user's profile Send private message
bruce2359
PostPosted: Fri May 18, 2018 11:55 am Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9394
Location: US: west coast, almost. Otherwise, enroute.

belchman wrote:
bruce2359,

Communication error.

1) I know MQP1 is a full repository
2) I believe it is was made a full repos so that the z/OS admin can see the full cluster

Sorry for confusion

MQ clustering software will only use the first two FRs as FRs. A 3rd FR, like MQP1, will not be used as an FR.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Fri May 18, 2018 8:18 pm Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20696
Location: LI,NY

bruce2359 wrote:

MQ clustering software will only use the first two FRs as FRs. A 3rd FR, like MQP1, will not be used as an FR.

Hi Bruce, can you please elaborate? I used to have a cluster with 4 FRs (transitory phase, while moving from AIX to Linux 2 FRs on each platform)
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
exerk
PostPosted: Sat May 19, 2018 1:17 am Post subject: Reply with quote

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

bruce2359 wrote:
MQ clustering software will only use the first two FRs as FRs. A 3rd FR, like MQP1, will not be used as an FR.

Bruce, where do you get the idea that MQP1 is a 3rd PR?

belchman wrote:
Aside from that, the cluster that I need to refresh with regularity has 2 fulls and 1 partial. The 2 fulls are an AIX qmgr and a z/OS qmgr (MQP1)...

_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.
Back to top
View user's profile Send private message
bruce2359
PostPosted: Sat May 19, 2018 6:11 am Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9394
Location: US: west coast, almost. Otherwise, enroute.

Did I misinterpret this?
belchman wrote:
Communication error.

2) I believe it is was made a full repos so that the z/OS admin can see the full cluster


Seems like it was a PR before it was made an FR by the z folks so they could see all cluster stuff. So, was MQP1 one of the two original explicit FRs - the FRs that will propagate cluster info?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
mvic
PostPosted: Sun May 20, 2018 7:03 am Post subject: Re: Mother of All Cluster Problems Reply with quote

Jedi

Joined: 09 Mar 2004
Posts: 2080

belchman wrote:
AMQ9469: Update not received for CLUSRCVR channel

PRs subscribe to FRs for the queues they use and the qmgrs that host those queues.
As long as apps on the PR continue to use a queue / qmgr, the PR itself should continue to renew its subscriptions for the queue and qmgr.
And, in return, the FRs are supposed to send updates to the PR for anything relating to that queue name or qmgr.
I don't know what causes your particular problem, but it could be something like:
- FRs have both "lost" their record of the subscriptions the PR sent to them (unlikely)
- PR neglected to make or remake its subscriptions (unlikely)
- Owner of the queue has been deleted or failed to re-announce itself or its queue (unlikely)
- Messages announcing presence of queue / qmgr or remaking the PR's subscription have been deleted from a SCTQ somewhere by an administrator (unlikely?)
- DR test was done sometime in the past, and your internal prod sequence numbers are a long way behind what DR increased them to (in some cases likely but yours? you said no DR. Did you mean "never" or has there been a DR test sometime in the distant past?).
So all of these ideas are unlikely and quite probably untrue in your particular case. Hopefully IBM will get to the root cause for you.
One more thing: there have been bugs in the past, what levels are you at on the PR and FR?
Back to top
View user's profile Send private message
gbaddeley
PostPosted: Sun May 20, 2018 4:49 pm Post subject: Reply with quote

Jedi

Joined: 25 Mar 2003
Posts: 2492
Location: Melbourne, Australia

tczielke wrote:
Personally, I would run the FRs on the same platform and IBM MQ code level.

I would think this is mandatory for cluster reliability. Also, the code level should be equal or above all other PRs qmgrs in the clusters. We enforce these requirements at our site.

If z/OS folks need a view of all clustered objects, they should use a tool that has access to the FR on distributed platforms.
_________________
Glenn
Back to top
View user's profile Send private message
Display posts from previous:
Post new topicReply to topic Goto page 1, 2  Next Page 1 of 2

MQSeries.net Forum IndexClusteringMother of All Cluster Problems
Jump to:



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP


Theme by Dustin Baccetti
Powered by phpBB 2001, 2002 phpBB Group

Copyright MQSeries.net. All rights reserved.