Author |
Message
|
sanvij |
Posted: Wed Mar 24, 2004 7:34 pm Post subject: (resolved) Damaged Objects... |
|
|
Novice
Joined: 21 Mar 2003 Posts: 24
|
We have 2 repos qmgrs running on solaris and 20 partial qmgrs on
Win 2000 and have anyware between 50-500 cluster queues/partial qmgr.
We saw couple of clustered Q`s damaged when one of the win 2000 m/c encountered a harware problem resulting in abrupt reboot of the m/c.
We deleted the damaged object and recreated the clus q`s and back to normal working conditions.
The Mqver is 5.3 with CSD05
Now my quetion:
1) I have seen this happening only on MQ5.3 and never expericened this problem with 5.1 or 5.2. Did anyone had similar problem?
2) Other than using mmc to find the damaged objects is there a command? _________________ Sanvij
IBM Certified MQSeries Specialist
IBM Certified WebSphere MQ Solution Designer |
|
Back to top |
|
 |
PeterPotkay |
Posted: Wed Mar 24, 2004 8:12 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
I am having the same problem with 5.3 CSD04 on Windows 2000.
This is the 4th time in 5 months this has happened. The queues that get damaged on our side are regular transmit queues, or MQSI broker queues.
Last week, the queue that was damaged ended up being the XMIT queue for the busiest channel in my company.
The only work around is to delete the queue file, and recreate the queue manually.
We have seen that if a transmit queue has a phantom message in it, that never goes a way and cannot be browsed, that is a sure sign the queue is damaged. The queue works fine in this state, until the next time the QM is brough down and up.
http://www.mqseries.net/phpBB2/viewtopic.php?t=11822&highlight=
We never saw this until going to 5.3, and so far it only happens on Windows.
I have an open PMR on this with IBM, and right now they have no idea how to prevent it. I expect (but I hope not) their next post to my PMR will be to try and apply CSD06 and see if that maybe fixes it. I will not be to confident if thats the answer. This will require a big outage, and there is no way to prove it worked short of waiting to see if it happens again. And if it does happen again, MQ in our customers eyes will take another hit as far as its reliability is concerned.
This Damaged Queue Problem has caused a LOT of problems for our applications losing business.
The first time this happened, IBM said try CSD05. We opted to wait it out, since at that time it had happened only once. Now that I see it happned to you at CSD05, it makes me less hopeful if I get the "See if CSD06 will fix it" answer. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
JasonE |
Posted: Thu Mar 25, 2004 2:52 am Post subject: |
|
|
Grand Master
Joined: 03 Nov 2003 Posts: 1220 Location: Hursley
|
Peter - Out of interest, have you tried csd06
(Sorry... couldnt resist it...) |
|
Back to top |
|
 |
PeterPotkay |
Posted: Thu Mar 25, 2004 5:47 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
If I have to schedule a major outage to apply CSD06, I want it to be because IBM said "We see the problem code in CSD04 and CSD05, and we KNOW that CSD06 will fix it."
Scheduling a major outage on a hope and a prayer doesn't thrill me. But, that may be our only option.
Right know, two of our queues have these ghost messages, including the one we just fixed Sunday! I expect that if the QM was bounced right now, we would have damaged queues again. And why in the world does MQ have to pick the 2 busiest XMIT queues in my company to damage?
Code: |
The queue depth on this queue shows 1.
E:\>amqsbcg HIGIDGP1.XMITQ HIGHUBPA
AMQSBCG0 - starts here
**********************
MQOPEN - 'HIGIDGP1.XMITQ'
No more messages
MQCLOSE
MQDISC
E:\>dspmqtrn -m HIGHUBPA
There are no prepared transactions.
E:\>dspmqtrn -m HIGHUBPA -i
There are no prepared transactions.
E:\>dspmqtrn -m HIGHUBPA -e
There are no prepared transactions.
E:\>
|
_________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
JasonE |
Posted: Thu Mar 25, 2004 7:32 am Post subject: |
|
|
Grand Master
Joined: 03 Nov 2003 Posts: 1220 Location: Hursley
|
I dont know of anything niside fp6 which will explicitly solve this problem (thats not to say there isnt), but there is additional diagnostics which can be turned on to do additional logging leading up to a failure apparently (never used them - too new!), so these are only use if you know how to hit a problem or hit it frequently enough to be worthwhile. |
|
Back to top |
|
 |
sanvij |
Posted: Thu Mar 25, 2004 7:04 pm Post subject: |
|
|
Novice
Joined: 21 Mar 2003 Posts: 24
|
Jason,
Is there a command to identify a damaged object?
Peter,
I agree with you taking a downtime in rolling out patch is painful if the problem is not fixed.
Infact first time when hit the problem we were using CSD01 and IBM gave us an APAR fix. We moved to CSD05 couple of months back ofcourse based on IBM`s suggetion to overcome few other small issues we encountered.
I did verify prior to applying CSD05 whether the APAR fix was bundled with it. _________________ Sanvij
IBM Certified MQSeries Specialist
IBM Certified WebSphere MQ Solution Designer |
|
Back to top |
|
 |
JasonE |
Posted: Fri Mar 26, 2004 2:13 am Post subject: |
|
|
Grand Master
Joined: 03 Nov 2003 Posts: 1220 Location: Hursley
|
Not as such - MQ doesnt know it is damaged until it comes to load it. Once flagged as damaged, it displays as such in the gui and through tunmqsc, I believe.
If you regularly get damaged objects, you should probably work through IBM service to identify why. However, finding the cause for these is not easy by a long way, and it is not something 'quick' to get through to resolution. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Fri Mar 26, 2004 5:21 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
I have a Severity 1 PMR open with IBM on this, and as soon as we have a proven fix, Jason or I will post the solution here. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
jgalvin03 |
Posted: Mon Jun 21, 2004 11:31 am Post subject: |
|
|
Newbie
Joined: 21 Jun 2004 Posts: 3
|
Has IBM replied with a fix yet?
We are currently deploying CSD06.
Just wondering if I need to start planning CSD07 |
|
Back to top |
|
 |
PeterPotkay |
Posted: Mon Jun 21, 2004 11:46 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
IBM sent me a fix to apply on top of CSD04 to prevent the ghost messages. Its been 2 weeks inproduction now, and no ghost messages.
IBM never said that ghost messages = damaged queue (eventually). My theory, which was confirmed again when we applied the patch and were rebooting, is that if a QM owns a queue that has a ghost message in it is failed over from node1 in a MSCS to node2, then the queue will come up damaged. The weird thing is if you bring the QM with the ghost message up and down on the same node, there is no problem.
Anyway, if the fix prevents the ghosts, then by default it prevents the damaged queues caused by them. I would assume the fix I got to apply to CSD04 is the same one included in CSD06. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
JasonE |
Posted: Tue Jun 22, 2004 4:42 am Post subject: |
|
|
Grand Master
Joined: 03 Nov 2003 Posts: 1220 Location: Hursley
|
The fix you had ended up as IC40975 - ie not in 6 (it was out) nor 7 (cutoff date had passed)
http://www-1.ibm.com/support/search.wss?apar=include&q=IC40975
The 'problem' is that it is impossible to tie up the ghost message with corruption, and given the problem it was difficult to say with certainty either way.
Without doubt though, there should be no difference stopping the qmgr and restarting it on the same node or another node - afterall, you are readng the same disk bytes and using the same registry keys (they are registered and synced between nodes)
But as is always true, theory and practice are two differing things... It could just be coincidence the problem occurred when swapping nodes, it could be something to do with the swap process (eg when swapping nodes a hw disk cache causes problems etc etc). |
|
Back to top |
|
 |
aboggis |
Posted: Wed Nov 03, 2004 1:37 pm Post subject: |
|
|
 Centurion
Joined: 18 Dec 2001 Posts: 105 Location: Auburn, California
|
|
Back to top |
|
 |
JasonE |
Posted: Fri Nov 05, 2004 8:30 am Post subject: |
|
|
Grand Master
Joined: 03 Nov 2003 Posts: 1220 Location: Hursley
|
Without sufficient evidence I would doubt it... I'll reply on that thread so as not to confuse topics |
|
Back to top |
|
 |
|