MQSeries.net :: View topic - (resolved) Damaged Objects...

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » Clustering » (resolved) Damaged Objects...

(resolved) Damaged Objects...

« View previous topic :: View next topic »

Author

Message

sanvij

Posted: Wed Mar 24, 2004 7:34 pm Post subject: (resolved) Damaged Objects...

Novice

Joined: 21 Mar 2003
Posts: 24

We have 2 repos qmgrs running on solaris and 20 partial qmgrs on
Win 2000 and have anyware between 50-500 cluster queues/partial qmgr.
We saw couple of clustered Q`s damaged when one of the win 2000 m/c encountered a harware problem resulting in abrupt reboot of the m/c.

We deleted the damaged object and recreated the clus q`s and back to normal working conditions.
The Mqver is 5.3 with CSD05

Now my quetion:

1) I have seen this happening only on MQ5.3 and never expericened this problem with 5.1 or 5.2. Did anyone had similar problem?
2) Other than using mmc to find the damaged objects is there a command?
_________________
Sanvij
IBM Certified MQSeries Specialist
IBM Certified WebSphere MQ Solution Designer

PeterPotkay

Posted: Wed Mar 24, 2004 8:12 pm Post subject:

Poobah

Joined: 15 May 2001
Posts: 7723

I am having the same problem with 5.3 CSD04 on Windows 2000.
This is the 4th time in 5 months this has happened. The queues that get damaged on our side are regular transmit queues, or MQSI broker queues.

Last week, the queue that was damaged ended up being the XMIT queue for the busiest channel in my company.

The only work around is to delete the queue file, and recreate the queue manually.

We have seen that if a transmit queue has a phantom message in it, that never goes a way and cannot be browsed, that is a sure sign the queue is damaged. The queue works fine in this state, until the next time the QM is brough down and up.
http://www.mqseries.net/phpBB2/viewtopic.php?t=11822&highlight=

We never saw this until going to 5.3, and so far it only happens on Windows.

I have an open PMR on this with IBM, and right now they have no idea how to prevent it. I expect (but I hope not) their next post to my PMR will be to try and apply CSD06 and see if that maybe fixes it. I will not be to confident if thats the answer. This will require a big outage, and there is no way to prove it worked short of waiting to see if it happens again. And if it does happen again, MQ in our customers eyes will take another hit as far as its reliability is concerned.

This Damaged Queue Problem has caused a LOT of problems for our applications losing business.

The first time this happened, IBM said try CSD05. We opted to wait it out, since at that time it had happened only once. Now that I see it happned to you at CSD05, it makes me less hopeful if I get the "See if CSD06 will fix it" answer.
_________________
Peter Potkay
Keep Calm and MQ On

JasonE

Posted: Thu Mar 25, 2004 2:52 am Post subject:

Grand Master

Joined: 03 Nov 2003
Posts: 1220
Location: Hursley

Peter - Out of interest, have you tried csd06

(Sorry... couldnt resist it...)

PeterPotkay

Posted: Thu Mar 25, 2004 5:47 am Post subject:

Poobah

Joined: 15 May 2001
Posts: 7723

If I have to schedule a major outage to apply CSD06, I want it to be because IBM said "We see the problem code in CSD04 and CSD05, and we KNOW that CSD06 will fix it."

Scheduling a major outage on a hope and a prayer doesn't thrill me.

But, that may be our only option.

Right know, two of our queues have these ghost messages, including the one we just fixed Sunday! I expect that if the QM was bounced right now, we would have damaged queues again. And why in the world does MQ have to pick the 2 busiest XMIT queues in my company to damage?

Code:

The queue depth on this queue shows 1.
E:\>amqsbcg HIGIDGP1.XMITQ HIGHUBPA

AMQSBCG0 - starts here
**********************

MQOPEN - 'HIGIDGP1.XMITQ'

No more messages
MQCLOSE
MQDISC
E:\>dspmqtrn -m HIGHUBPA
There are no prepared transactions.

E:\>dspmqtrn -m HIGHUBPA -i
There are no prepared transactions.

E:\>dspmqtrn -m HIGHUBPA -e
There are no prepared transactions.

E:\>

_________________
Peter Potkay
Keep Calm and MQ On

JasonE

Posted: Thu Mar 25, 2004 7:32 am Post subject:

Grand Master

Joined: 03 Nov 2003
Posts: 1220
Location: Hursley

I dont know of anything niside fp6 which will explicitly solve this problem (thats not to say there isnt), but there is additional diagnostics which can be turned on to do additional logging leading up to a failure apparently (never used them - too new!), so these are only use if you know how to hit a problem or hit it frequently enough to be worthwhile.

sanvij

Posted: Thu Mar 25, 2004 7:04 pm Post subject:

Novice

Joined: 21 Mar 2003
Posts: 24

Jason,

Is there a command to identify a damaged object?

Peter,

I agree with you taking a downtime in rolling out patch is painful if the problem is not fixed.

Infact first time when hit the problem we were using CSD01 and IBM gave us an APAR fix. We moved to CSD05 couple of months back ofcourse based on IBM`s suggetion to overcome few other small issues we encountered.

I did verify prior to applying CSD05 whether the APAR fix was bundled with it.
_________________
Sanvij
IBM Certified MQSeries Specialist
IBM Certified WebSphere MQ Solution Designer

JasonE

Posted: Fri Mar 26, 2004 2:13 am Post subject:

Grand Master

Joined: 03 Nov 2003
Posts: 1220
Location: Hursley

Not as such - MQ doesnt know it is damaged until it comes to load it. Once flagged as damaged, it displays as such in the gui and through tunmqsc, I believe.

If you regularly get damaged objects, you should probably work through IBM service to identify why. However, finding the cause for these is not easy by a long way, and it is not something 'quick' to get through to resolution.

PeterPotkay

Posted: Fri Mar 26, 2004 5:21 am Post subject:

Poobah

Joined: 15 May 2001
Posts: 7723

I have a Severity 1 PMR open with IBM on this, and as soon as we have a proven fix, Jason or I will post the solution here.
_________________
Peter Potkay
Keep Calm and MQ On

jgalvin03

Posted: Mon Jun 21, 2004 11:31 am Post subject:

Newbie

Joined: 21 Jun 2004
Posts: 3

Has IBM replied with a fix yet?

We are currently deploying CSD06.

Just wondering if I need to start planning CSD07

PeterPotkay

Posted: Mon Jun 21, 2004 11:46 am Post subject:

Poobah

Joined: 15 May 2001
Posts: 7723

IBM sent me a fix to apply on top of CSD04 to prevent the ghost messages. Its been 2 weeks inproduction now, and no ghost messages.

IBM never said that ghost messages = damaged queue (eventually). My theory, which was confirmed again when we applied the patch and were rebooting, is that if a QM owns a queue that has a ghost message in it is failed over from node1 in a MSCS to node2, then the queue will come up damaged. The weird thing is if you bring the QM with the ghost message up and down on the same node, there is no problem.

Anyway, if the fix prevents the ghosts, then by default it prevents the damaged queues caused by them. I would assume the fix I got to apply to CSD04 is the same one included in CSD06.
_________________
Peter Potkay
Keep Calm and MQ On

JasonE

Posted: Tue Jun 22, 2004 4:42 am Post subject:

Grand Master

Joined: 03 Nov 2003
Posts: 1220
Location: Hursley

The fix you had ended up as IC40975 - ie not in 6 (it was out) nor 7 (cutoff date had passed)

http://www-1.ibm.com/support/search.wss?apar=include&q=IC40975

The 'problem' is that it is impossible to tie up the ghost message with corruption, and given the problem it was difficult to say with certainty either way.

Without doubt though, there should

be no difference stopping the qmgr and restarting it on the same node or another node - afterall, you are readng the same disk bytes and using the same registry keys (they are registered and synced between nodes)

But as is always true, theory and practice are two differing things... It could just be coincidence the problem occurred when swapping nodes, it could be something to do with the swap process (eg when swapping nodes a hw disk cache causes problems etc etc).

aboggis

Posted: Wed Nov 03, 2004 1:37 pm Post subject:

Centurion

Joined: 18 Dec 2001
Posts: 105
Location: Auburn, California

I wonder if this is what I am getting: http://www.mqseries.net/phpBB2/viewtopic.php?t=18680 ?

This is on Solaris 5.8 with WMQ 5.3 CSD08.

JasonE

Posted: Fri Nov 05, 2004 8:30 am Post subject:

Grand Master

Joined: 03 Nov 2003
Posts: 1220
Location: Hursley

Without sufficient evidence I would doubt it... I'll reply on that thread so as not to confuse topics

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » Clustering » (resolved) Damaged Objects...

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP