Author |
Message
|
eshwar |
Posted: Thu Nov 06, 2008 3:13 pm Post subject: Corrupted Cluster--recovery |
|
|
Newbie
Joined: 23 Oct 2008 Posts: 7
|
How to recover the corrupted Cluster (Full Repository). and what happens to the Queue Managers (Partial Repository) when the Full Repos corrupted. |
|
Back to top |
|
 |
Vitor |
Posted: Thu Nov 06, 2008 4:15 pm Post subject: Re: Corrupted Cluster--recovery |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
eshwar wrote: |
How to recover the corrupted Cluster (Full Repository). |
An FR is just a queue manager. If the cluster queues have become corrupted, recover them as you would any other queue. Or is that not what you mean by "corrupted"?
eshwar wrote: |
what happens to the Queue Managers (Partial Repository) when the Full Repos corrupted. |
Nothing. The PRs will only refer to the FRs when they need additional cluster information (like putting messages to a queue they've not used before, or a new cluster queue being defined). If nothing changes, the PRs won't notice.
Of course, if something changes you've potentially got problems  _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
fjb_saper |
Posted: Thu Nov 06, 2008 8:33 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
The fastest way, it may disrupt the cluster for the duration of the maneuver, is to switch the FR to a PR and back to an FR.
When at PR level you can do a refresh cluster(mycluster) repos(yes) to clear all the corrupted info...
Switch back to an FR an watch the info flow from one FR to the other...
Make sure the qmgr is suspended from the cluster during that maneuver and do not forget to resume the qmgr to the cluster.
However my experience is that when this happens you usually have both repositories corrupted. You need to find the culprit in the cluster (dis clusqmgr(*) on a full repos gives you a list of the qmgrs in the cluster with the channels). Check the channels. You may well find that 2 different qmgrs have the same cluster receiver channel defined...(very bad). You need to go fix the problem first and then you can purge/refresh the FRs, although that should happen by itself albeit slower than you are willing to wait.
Boot the offending PR (that you have fixed) out of the cluster if need be and have it rejoin by executing a refresh cluster on the PR in question.
Enjoy  _________________ MQ & Broker admin |
|
Back to top |
|
 |
PeterPotkay |
Posted: Fri Nov 07, 2008 6:23 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
What does "corrupted FR" exactly mean? How do you know? Once we have that info, then we can maybe suggest what to do. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
bruce2359 |
Posted: Fri Nov 07, 2008 6:30 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Have you read the WMQ Clusters manual? Portions of that manual deal with problem determination and resolution. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
eshwar |
Posted: Fri Nov 07, 2008 1:48 pm Post subject: |
|
|
Newbie
Joined: 23 Oct 2008 Posts: 7
|
There are 3 Queue Managers
QM1 & QM2: Full Repositories.
QM3: Partial Repository.
There was a production issue related to the Platform( solaris) and they restarted the server.
Once the server came up & running none of the queue managers were
able to communicate. I tried sending message from QM1 to QM2 & Vice-Versa but not able to do that, the same was with the QM3(partial repos). So I have deleted the Queue Managers and re-created it from scratch making them full repositories(Before deleting QM1 & QM2I have Suspened the QM3 and refreshed the Cluster).
Then when I am trying to add QM3 to the Full repos, I am not able to do that and none of the queues on QM3 are showing up in Full Repository.
Tried below Steps:
Tried re-starting QM3.
Deleted and re-created the Cluster Sender & rcvr channels to Full repos.
Lastly I have recycled QM3 and then re-created everything making it a partial repository. Then I was able to communicate and everything showing up in the Full repos.
Now my Question is:
Can't I make QM3 to communicate with Full Repos without Recycling it.
As the QM1, QM2 & QM3 were not able to communicate initially I came to a conclusion that Cluster might have corrupted. |
|
Back to top |
|
 |
bruce2359 |
Posted: Fri Nov 07, 2008 2:10 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Quote: |
There was a production issue ... |
What exactly happened? _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Fri Nov 07, 2008 2:14 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
eshwar wrote: |
So I have deleted the Queue Managers and re-created it from scratch making them full repositories(Before deleting QM1 & QM2I have Suspened the QM3 and refreshed the Cluster).
|
All your MQ problems started here.
There was no need to delete the QMs.
Once you decided to delete them, it looks like you didn't properly remove them from the cluster first, setting you up for problems when you recreated them
There was no need to suspend QM3.
There was no need to refresh the cluster.
The MQ Clustering Manual has explicit instructions for problem resolution in a cluster, including steps to delete a clustered QM.
eshwar wrote: |
Now my Question is:
Can't I make QM3 to communicate with Full Repos without Recycling it.
|
There is no need to recycle a QM to get it to talk to another QM.
Do study that MQ Clustering manual. Lots of good stuff in there. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
Vitor |
Posted: Fri Nov 07, 2008 3:57 pm Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
eshwar wrote: |
As the QM1, QM2 & QM3 were not able to communicate initially I came to a conclusion that Cluster might have corrupted. |
How? Were there errors in the log? Channels not running? Or a wild guess?
As my honoured associate correctly points out, just deleting and recreating a clustered queue manager without properly removing it (note I did not say suspending) is a one-way road to pain, misery & suffering.
Spend some quality time with the Clusters manual.  _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
|