|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
 |
|
MQCluster Prob. |
« View previous topic :: View next topic » |
Author |
Message
|
tomo |
Posted: Fri Dec 07, 2001 12:31 am Post subject: |
|
|
Newbie
Joined: 06 Dec 2001 Posts: 3 Location: Japan
|
I configure very complex MQCluster environment using 2 AIXs(They are full repositores) and 5 Solarises(They are partial).
I tested the env with an application, and I had a problem. That happened when I halted one of the full repository AIX machine while the application on SUN is holding a message.
After rebooting the halted AIX machine one of the CLUSSDR channels from AIX to the SUN become in in-doubt status and it made all of the CLUSSDR channels from AIX to the other machines retrying.
The only way I could make the channels run was to start the in-doubted channel.
After [start channel(in-doubted channel)], all of the retrying channels came running.
I think it is a defect of MQ.
All of the channels should run after halting machine.
Could anyone help me?
Thank you.
|
|
Back to top |
|
 |
bduncan |
Posted: Fri Dec 07, 2001 4:55 pm Post subject: |
|
|
Padawan
Joined: 11 Apr 2001 Posts: 1554 Location: Silicon Valley
|
A few questions:
1) When you say you "halted" one of the full repositories, what mechanism did you use? Did you cycle the power, stop MQSeries, etc??
2) You mention that the sender channel on the halted AIX machine to the SUN box holding the message was indoubt, while the rest of the sender channels were retrying. Does this include the sender channel going to the other full repository (AIX machine)??
3) What was the status of the channels on the other full repository? Were all the sun boxes able to communicate with it?
_________________ Brandon Duncan
IBM Certified MQSeries Specialist
MQSeries.net forum moderator |
|
Back to top |
|
 |
tomo |
Posted: Mon Dec 10, 2001 7:54 pm Post subject: |
|
|
Newbie
Joined: 06 Dec 2001 Posts: 3 Location: Japan
|
Thank you for your attention.
1)I halted by the command " halt -q " on AIX.
All of the application were shutdown suddenly with abnormal machine shutdown.
So, MQ , Oracle and so on was killed suddenly.
2)Yes. Only the channels(This box joins 4 clusters.So 4 channels goes to the other box) going to the other full repository were retrying and the other sender channels made by cluster automatically were invisible.
All of the clusrcvr were running.
3) Ths channel status on the other full repository(AIX) were running.
Sorry,I did not check the connection between the fine AIX and SUN boxes.
I will check it next time.
|
|
Back to top |
|
 |
bduncan |
Posted: Wed Dec 12, 2001 10:37 pm Post subject: |
|
|
Padawan
Joined: 11 Apr 2001 Posts: 1554 Location: Silicon Valley
|
I haven't heard of this specific type of behavior before. I'm curious, before issuing the start command to get the RETRYING channel running again, did you try sending a message across that channel? It could be that the channel had been in a RETRYING state for a while (when the system was in flux) and because the short retry intervals had been exhausted, you entered into a period of long retry intervals. What this means is that the channel would appear RETRYING even though there was no longer a problem with the connection. It just means that the thread that actually does the retry hasn't gotten around to making another attempt at starting the channel yet. Attempting to send another message across this channel could force the queue manager to attempt another retry, at which time it would start just fine.
I think this is what happened to you, because normally, if a channel is retrying, and the queue manager is actively trying to restart it to no avail, a simple START CHANNEL command won't bring it up... If on the other hand, the queue manager is in between retry attempts (and this interval can be quite large depending on your configuration) a START CHANNEL command will force another retry, and because there isn't actually a problem with the channel (there was at some point; this is why the channel went into retry mode, but it was a transient problem, and no longer exists) it should start up just fine...
_________________ Brandon Duncan
IBM Certified MQSeries Specialist
MQSeries.net forum moderator |
|
Back to top |
|
 |
DJGoodrich |
Posted: Wed Dec 19, 2001 7:28 am Post subject: |
|
|
Apprentice
Joined: 12 Dec 2001 Posts: 30 Location: SW Florida
|
What are your channel attrbiutes? Specifically the Disconnect Interval, HeartBeat interval, batchsize, long & short retry counts & intervals?
Do you have KeepAlive specified for TCP and within your queue manager(s)? |
|
Back to top |
|
 |
tomo |
Posted: Mon Jan 07, 2002 3:26 am Post subject: |
|
|
Newbie
Joined: 06 Dec 2001 Posts: 3 Location: Japan
|
Thanks DJGoodrich,
The attribute is as follows,
-----CLUSRCVR-----
BATCHSZ(1)
DISCINT(0)
SHORTRTY(20)
SHORTTMR(60)
HBINT(16)
BATCHINT(0)
-----CLUSSDR-----
BATCHSZ(1)
DISCINT(0)
SHORTRTY(20)
SHORTTMR(60)
HBINT(16)
The other things are by default.
KeepAlive specified for TCP and within queue manager(s) are also defaults.
KeepAlive=yes
======================================
Today,I can close this issue!
I have got an e-fix(patch) from IBM about this prob.
This is a defect they've already had.
(Why did it take so much time???)
I haven't tested it yet but tell you the result soon.
Thank you everyone to read this.
[ This Message was edited by: tomo on 2002-01-07 03:29 ] |
|
Back to top |
|
 |
|
|
 |
|
Page 1 of 1 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|