MQSeries.net :: View topic - MQCluster Prob.

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » Clustering » MQCluster Prob.

MQCluster Prob.

« View previous topic :: View next topic »

Author

Message

tomo

Posted: Fri Dec 07, 2001 12:31 am Post subject:

Newbie

Joined: 06 Dec 2001
Posts: 3
Location: Japan

I configure very complex MQCluster environment using 2 AIXs(They are full repositores) and 5 Solarises(They are partial).

I tested the env with an application, and I had a problem. That happened when I halted one of the full repository AIX machine while the application on SUN is holding a message.

After rebooting the halted AIX machine one of the CLUSSDR channels from AIX to the SUN become in in-doubt status and it made all of the CLUSSDR channels from AIX to the other machines retrying.

The only way I could make the channels run was to start the in-doubted channel.
After [start channel(in-doubted channel)], all of the retrying channels came running.

I think it is a defect of MQ.
All of the channels should run after halting machine.

Could anyone help me?
Thank you.

bduncan

Posted: Fri Dec 07, 2001 4:55 pm Post subject:

Padawan

Joined: 11 Apr 2001
Posts: 1554
Location: Silicon Valley

A few questions:
1) When you say you "halted" one of the full repositories, what mechanism did you use? Did you cycle the power, stop MQSeries, etc??
2) You mention that the sender channel on the halted AIX machine to the SUN box holding the message was indoubt, while the rest of the sender channels were retrying. Does this include the sender channel going to the other full repository (AIX machine)??
3) What was the status of the channels on the other full repository? Were all the sun boxes able to communicate with it?

_________________
Brandon Duncan
IBM Certified MQSeries Specialist
MQSeries.net forum moderator

tomo

Posted: Mon Dec 10, 2001 7:54 pm Post subject:

Newbie

Joined: 06 Dec 2001
Posts: 3
Location: Japan

Thank you for your attention.

1)I halted by the command " halt -q " on AIX.
All of the application were shutdown suddenly with abnormal machine shutdown.
So, MQ , Oracle and so on was killed suddenly.

2)Yes. Only the channels(This box joins 4 clusters.So 4 channels goes to the other box) going to the other full repository were retrying and the other sender channels made by cluster automatically were invisible.
All of the clusrcvr were running.

3) Ths channel status on the other full repository(AIX) were running.
Sorry,I did not check the connection between the fine AIX and SUN boxes.
I will check it next time.

bduncan

Posted: Wed Dec 12, 2001 10:37 pm Post subject:

Padawan

Joined: 11 Apr 2001
Posts: 1554
Location: Silicon Valley

I haven't heard of this specific type of behavior before. I'm curious, before issuing the start command to get the RETRYING channel running again, did you try sending a message across that channel? It could be that the channel had been in a RETRYING state for a while (when the system was in flux) and because the short retry intervals had been exhausted, you entered into a period of long retry intervals. What this means is that the channel would appear RETRYING even though there was no longer a problem with the connection. It just means that the thread that actually does the retry hasn't gotten around to making another attempt at starting the channel yet. Attempting to send another message across this channel could force the queue manager to attempt another retry, at which time it would start just fine.
I think this is what happened to you, because normally, if a channel is retrying, and the queue manager is actively trying to restart it to no avail, a simple START CHANNEL command won't bring it up... If on the other hand, the queue manager is in between retry attempts (and this interval can be quite large depending on your configuration) a START CHANNEL command will force another retry, and because there isn't actually a problem with the channel (there was at some point; this is why the channel went into retry mode, but it was a transient problem, and no longer exists) it should start up just fine...

_________________
Brandon Duncan
IBM Certified MQSeries Specialist
MQSeries.net forum moderator

DJGoodrich

Posted: Wed Dec 19, 2001 7:28 am Post subject:

Apprentice

Joined: 12 Dec 2001
Posts: 30
Location: SW Florida

What are your channel attrbiutes? Specifically the Disconnect Interval, HeartBeat interval, batchsize, long & short retry counts & intervals?

Do you have KeepAlive specified for TCP and within your queue manager(s)?

tomo

Posted: Mon Jan 07, 2002 3:26 am Post subject:

Newbie

Joined: 06 Dec 2001
Posts: 3
Location: Japan

Thanks DJGoodrich,

The attribute is as follows,

-----CLUSRCVR-----
BATCHSZ(1)
DISCINT(0)
SHORTRTY(20)
SHORTTMR(60)
HBINT(16)
BATCHINT(0)

-----CLUSSDR-----
BATCHSZ(1)
DISCINT(0)
SHORTRTY(20)
SHORTTMR(60)
HBINT(16)

The other things are by default.
KeepAlive specified for TCP and within queue manager(s) are also defaults.
KeepAlive=yes

======================================
Today,I can close this issue!
I have got an e-fix(patch) from IBM about this prob.
This is a defect they've already had.
(Why did it take so much time???)

I haven't tested it yet but tell you the result soon.

Thank you everyone to read this.

[ This Message was edited by: tomo on 2002-01-07 03:29 ]

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » Clustering » MQCluster Prob.

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP