Author |
Message
|
sm138929 |
Posted: Wed Aug 29, 2007 8:14 pm Post subject: Channels getting dropped |
|
|
Apprentice
Joined: 29 Aug 2007 Posts: 25
|
Hi ,
Our MQ 5.3 server on Solaris is connected to another MQ server of another organisation over dedicated VPN link.The channels in our prod environment are now getting dropped since we migrated our datcenter to a new location .the Qmrs talk to each other with a MQ IPT v 1.3.2 in between .
The We have QMgrs in Sun cluster in both nsource as well as our target .The MQ IPTs are also in cluster .The The outbound traffic from our QMRs reach the target QMRs witha Load balancer in between >then MQ IPT .We are getting the error message :Channels ending abnormaly .We are trying to find the root cause in this channel drop issue .However the channels do come up after a retry but sometime we need to start it manually .Network guy are saying the network is ok .but please help me out how to get rid of this problem ..I am very new to this MQ world .so may be I maight have missed some basic thing.
Thanks
SM |
|
Back to top |
|
 |
sidharth_bora |
Posted: Wed Aug 29, 2007 9:04 pm Post subject: |
|
|
 Voyager
Joined: 24 Nov 2005 Posts: 87
|
Hi SM,
Check in the error logs ... do you find something like .. remote destination couldnr be reached... what happens when you try to ping the channel when its in retrying state ... ?
My guess is network is not ok ....can you check if the icmp packets are dropped...
do a os pingand check the relay time... if you find its taking more than normal...then there's some fireweall issue ...
also do a telnet to the destination ip and port ... are u able to connect whwn chanel is in retrying state ..
Sid |
|
Back to top |
|
 |
sm138929 |
Posted: Thu Aug 30, 2007 3:11 pm Post subject: Channels getting dropped |
|
|
Apprentice
Joined: 29 Aug 2007 Posts: 25
|
Hi ,thanks for the suggestion .
We are checking the Network settings ,however we got the following errors today also .
EXPLANATION:
An error occurred receiving data from 172.X.X.X over TCP/IP. This may be due
to a communications failure.
ACTION:
The return code from the TCP/IP (read) call was 131 (X'83'). Record these
values and tell the systems administrator.
AMQ9999: Channel program ended abnormally.
EXPLANATION:
Channel program 'CHN.XXXX.TO.YYY' ended abnormally.
ACTION:
Look at previous error messages for channel program 'CHN.XXXX.TO.YYY' in the
error files to determine the cause of the failure.
AMQ9558: Remote Channel is not currently available.
EXPLANATION:
The channel program ended because the channel 'CHN.XXXX.TO.YYY' is not
currently available on the remote system. This could be because the channel is
disabled or that the remote system does not have sufficient resources to run a
further channel.
ACTION:
Check the remote system to ensure that the channel is available to run, and
retry the operation.
We have seen that the channels go to a retrying state and agai coe up after sometime .We are keeping a watch on the proceedings and the connection settings .
My concern is if we have some MQ related issue .
Regards,
SM |
|
Back to top |
|
 |
fjb_saper |
Posted: Thu Aug 30, 2007 5:42 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Looks like you might have ... possibly on the target qmgr.
You have only a limited number of channels. Once those are running any additional request for a channel will be denied resulting in an error message like the one you displayed....
Are all apps closing correctly their connection?
Enjoy  _________________ MQ & Broker admin |
|
Back to top |
|
 |
sidharth_bora |
Posted: Thu Aug 30, 2007 6:42 pm Post subject: |
|
|
 Voyager
Joined: 24 Nov 2005 Posts: 87
|
Also what does the error log says in the destination QM.... |
|
Back to top |
|
 |
sm138929 |
Posted: Thu Aug 30, 2007 10:43 pm Post subject: MQ channels getting dropped |
|
|
Apprentice
Joined: 29 Aug 2007 Posts: 25
|
Hi ,
Actually we are also trying to see if the destination queue mgr is having some issue .
Since I am new to all this ,can you let me know how I can detect if the applications are closing the connection properly or not .
Thanks,
SM |
|
Back to top |
|
 |
fjb_saper |
Posted: Fri Aug 31, 2007 12:50 pm Post subject: Re: MQ channels getting dropped |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
sm138929 wrote: |
Hi ,
Actually we are also trying to see if the destination queue mgr is having some issue .
Since I am new to all this ,can you let me know how I can detect if the applications are closing the connection properly or not .
Thanks,
SM |
Channel status and the number of open instances of a server connection channel will clue you in to that...
Read the client manuals
Enjoy  _________________ MQ & Broker admin |
|
Back to top |
|
 |
sm138929 |
Posted: Fri Aug 31, 2007 1:05 pm Post subject: Channels getting dropped |
|
|
Apprentice
Joined: 29 Aug 2007 Posts: 25
|
Hi ,
I guess the server connection channels are not getting dropped .It is the inter MQ CHANNELS ARE DROPPED .Channels are off for 10-15 secs for 5-8 times per day and automatically reconnect afte going in a retry state.
We got this error too ...
AMQ9213: A communications error for TCP/IP occurred.
EXPLANATION:
An unexpected error occurred in communications.
ACTION:
The return code from the TCP/IP(select) [TIMEOUT] 360 seconds call was 11
(X'B'). Record these values and tell the systems administrator.
I will read the client manuals for sure ..Anyway thanks for the suggestion .
Regards,
SM |
|
Back to top |
|
 |
fjb_saper |
Posted: Fri Aug 31, 2007 1:13 pm Post subject: Re: Channels getting dropped |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
sm138929 wrote: |
Hi ,
I guess the server connection channels are not getting dropped .It is the inter MQ CHANNELS ARE DROPPED .Channels are off for 10-15 secs for 5-8 times per day and automatically reconnect afte going in a retry state.
We got this error too ...
AMQ9213: A communications error for TCP/IP occurred.
EXPLANATION:
An unexpected error occurred in communications.
ACTION:
The return code from the TCP/IP(select) [TIMEOUT] 360 seconds call was 11
(X'B'). Record these values and tell the systems administrator.
Regards,
SM |
For this you should read the intercommunication manual _________________ MQ & Broker admin |
|
Back to top |
|
 |
sm138929 |
Posted: Wed Sep 12, 2007 4:31 pm Post subject: |
|
|
Apprentice
Joined: 29 Aug 2007 Posts: 25
|
We have also checked the FDC files ..it is giving the following error
WebSphere MQ First Failure Symptom Report |
| ========================================= |
| |
| Date/Time :- Sunday September 09 01:52:59 PDT 2007 |
| Host Name :- XXXXXXXX (SunOS 5.9) |
| PIDS :- 5724B4103 |
| LVLS :- 530.4 CSD04 |
| Product Long Name :- WebSphere MQ for Sun Solaris |
| Vendor :- IBM |
| Probe Id :- XC338001 |
| Application Name :- MQM |
| Component :- xehAsySignalHandler |
| Build Date :- Jun 18 2003 |
| CMVC level :- p530-04-030617 |
| Build Type :- IKAP - (Production) |
| UserID :- 00003718 (mqm) |
| Program Name :- runmqsc |
| Process :- 00016912 |
| Thread :- 00000002 |
| Major Errorcode :- xecE_W_UNEXPECTED_ASYNC_SIGNAL |
| Minor Errorcode :- OK |
| Probe Type :- MSGAMQ6209 |
| Probe Severity :- 3 |
| Probe Description :- AMQ6109: An internal WebSphere MQ error has occurred. |
| FDCSequenceNumber :- 0 |
| Arith1 :- 1 1 |
| Arith2 :- 16912 4210 |
| Comment1 :- SIGHUP |
| |
| |
+-----------------------------------------------------------------------------+
MQM Function Stack
xehAsySignalMonitor
xehHandleAsySignal
xcsFFST
MQM Trace History
{ xppInitialiseDestructorRegistrations
} xppInitialiseDestructorRegistrations rc=OK
{ xcsFreeMem
} xcsFreeMem rc=OK
{ xehAsySignalMonitor
-{ xppPostAsySigMonThread
-} xppPostAsySigMonThread rc=OK
-{ xehHandleAsySignal
--{ xcsRequestThreadMutexSem
--} xcsRequestThreadMutexSem rc=OK
--{ xcsFFST
Component Dumps (Thread 00000002)
-------------------------------------
Does it have any relation to the current channel drop issue .Do we need to have the AdoptMCA parameter .Please let me know if I am wrong in troubleshooting .
Thanks
SM |
|
Back to top |
|
 |
Nigelg |
Posted: Wed Sep 12, 2007 7:01 pm Post subject: |
|
|
Grand Master
Joined: 02 Aug 2004 Posts: 1046
|
You are seriously back-level. Did you know that your system will only be supported by IBM for another 15 days?
Anyway...
The FDC is nothing to do with the initial problem. It is reporting that the runmqsc process has received an asynchronous signal (SIGHUP) and ignored it.
The channel problem is caused by network glitches.
Quote: |
The return code from the TCP/IP (read) call was 131 (X'83'). |
Big clue there...
Quote: |
My concern is if we have some MQ related issue |
How did you suppose that a TCP error could be caused by WMQ?
131 is ECONNRESET, i.e. the connection has been ended somewhere.
Quote: |
The return code from the TCP/IP(select) [TIMEOUT] 360 seconds call was 11
|
This means that WMQ has waited 360 seconds for a reply from the partner connection, and it has not been returned. This is also caused by a network error. _________________ MQSeries.net helps those who help themselves.. |
|
Back to top |
|
 |
sm138929 |
Posted: Fri Sep 14, 2007 1:14 pm Post subject: |
|
|
Apprentice
Joined: 29 Aug 2007 Posts: 25
|
Hi ,
As per yor suggestion we have asked oyur network guys to make a full investigation of the network and to use a sniffer to check where we are having a connection drop .However let me ask one thing ..In our channel properties we have disconnection interval set to 0 and heartbeat interval set to 300 secs.This setting was there earlier also in our old datacentre .That time we had no such channel drop regularlly .
thanks,
SM |
|
Back to top |
|
 |
fjb_saper |
Posted: Fri Sep 14, 2007 8:01 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
sm138929 wrote: |
Hi ,
As per yor suggestion we have asked oyur network guys to make a full investigation of the network and to use a sniffer to check where we are having a connection drop .However let me ask one thing ..In our channel properties we have disconnection interval set to 0 and heartbeat interval set to 300 secs.This setting was there earlier also in our old datacentre .That time we had no such channel drop regularlly .
thanks,
SM |
A channel running all the time is not the most healthy thing. An idle channel should go inactive and be triggered when the next message hits the xmitq.
Also don't look too far for your FDC. It says the program running was runmqsc. This could just have been generated because a Unix admin killed your runmqsc session (runaway session?)
Enjoy  _________________ MQ & Broker admin |
|
Back to top |
|
 |
Vitor |
Posted: Sat Sep 15, 2007 5:52 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
sm138929 wrote: |
However let me ask one thing ..In our channel properties we have disconnection interval set to 0 and heartbeat interval set to 300 secs. |
It does mean the channel is vunerable to network problems. Ideally it should be triggered. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
|