Author |
Message
|
jgooch |
Posted: Thu Jul 24, 2003 9:39 am Post subject: [Solved] Retrying cluster channel. |
|
|
 Acolyte
Joined: 29 May 2002 Posts: 63 Location: UK
|
Hi,
We have a problem with a cluster channel that goes to "retrying" and we cannot restart it. Messages are left in the SYSTEM.CLUSTER.TRANSMIT.QUEUE. There are cluster channels from the same source queue manager to other destination queue managers that work OK (even some to other queue managers that are on the same machine as the problem destination qmgr).
We had an issue with a Java source application that could not connect to cluster queues (error was 2085), and our workaround was to remove the queues from the cluster and use remote definitions and explicit channels to transmit their data. It might be a red herring but I wonder if the issue is caused by a queue that is defined both as a remote definition and as a cluster? I haven't found such a beast but I can't think why else a connection to one qmgr works but a connection to another doesn't.
What else might it be?
I've searched the forum for similar problems to this and have tried:-
- stopping and resetting the channel;
- stopping and resolving the channel;
- a combination of reset and resolve;
- stopping and restarting the queue managers at both ends;
- deleting and recreating the queue manager on the source side.
All of this was to no avail. Any ideas/thoughts/etc would be gratefully received!
J. |
|
Back to top |
|
 |
jefflowrey |
Posted: Thu Jul 24, 2003 10:51 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
Are you getting error messages in your system log about why the channel can't start?
Have you verified that the connection name for the channel is correct? |
|
Back to top |
|
 |
jgooch |
Posted: Fri Jul 25, 2003 12:18 am Post subject: |
|
|
 Acolyte
Joined: 29 May 2002 Posts: 63 Location: UK
|
OK I looked in /var/adm/messages and there's not a lot of detail. However, there are a couple of occasions when a file system error occurs right before an FFST record is created.
Quote: |
Jul 24 17:26:35 PROMETHEE unix: NOTICE: alloc: /hiport: file system full
Jul 24 17:26:36 PROMETHEE MQSeries: FFST record created in /var/mqm/errors/AMQ19097.0.FDC |
The header of the FFST record is:-
Quote: |
+-----------------------------------------------------------------------------+
| |
| MQSeries First Failure Symptom Report |
| ===================================== |
| |
| Date/Time :- Thursday July 24 16:43:34 MET DST 2003 |
| Host Name :- PROMETHEE (SunOS 5.6) |
| PIDS :- 5765B75 |
| LVLS :- 520 |
| Product Long Name :- MQSeries for Sun Solaris 2 (Sparc) |
| Vendor :- IBM |
| Probe Id :- HL077070 |
| Application Name :- MQM |
| Component :- mqlpgmrf |
| Build Date :- Oct 15 2001 |
| CMVC level :- p520-aux-CSD02G |
| Build Type :- IKAP - (Production) |
| UserID :- 00001047 (mqm) |
| Program Name :- amqhasmx |
| Process :- 00015906 |
| Thread :- 00000001 |
| QueueManager :- MHIPU01 |
| Major Errorcode :- hrcE_MQLO_DISK |
| Minor Errorcode :- OK |
| Probe Type :- INCORROUT |
| Probe Severity :- 2 |
| Probe Description :- AMQ6125: An internal MQSeries error has occurred. |
| |
+-----------------------------------------------------------------------------+ |
|
|
Back to top |
|
 |
jgooch |
Posted: Fri Jul 25, 2003 6:40 am Post subject: Resolved! |
|
|
 Acolyte
Joined: 29 May 2002 Posts: 63 Location: UK
|
We've resolved the issue, with the help of IBM.
It's caused by running a version of MQSeries on our Solaris 5.6 machine that is not recommended as part of a cluster (it's v5.2 CSD02G - IBM recommend CSD05 or above).
IBM's reply included:-
Quote: |
The specific defect which deals with INDOUBT CLUSSDR channels
is 59593 which has the following title:
"Cluster channels should not check for other channels indoubt against the same transmission queue."
As this is an internal defect we don't have a lot of detail on the issue, but this is a brief overview of it provided by the change team:
"About the defect 59593, the synchronisation file was used to check the channel status. This seems to be a problem and solution here is to turn on the work structure bit. This is all we can give out about the defect in short some structure members are modified."
As discussed the other alternative is to upgrade to v5.3. However, I have just checked the supported product website and v2.6 of Solaris is not supported with v5.3. Therefore I'm afraid the only option is to apply csd07, which is available from this website:
http://www-3.ibm.com/software/integration/mqfamily/support/summary/sun.html |
J. |
|
Back to top |
|
 |
bduncan |
Posted: Fri Jul 25, 2003 12:30 pm Post subject: |
|
|
Padawan
Joined: 11 Apr 2001 Posts: 1554 Location: Silicon Valley
|
Thanks for the update...
It's always nice when people come back and tell us what the solution is. Hopefully the next guy who runs into this will find your thread.  _________________ Brandon Duncan
IBM Certified MQSeries Specialist
MQSeries.net forum moderator |
|
Back to top |
|
 |
|