ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » MQ xmit channel failure from hard NW break...

Post new topic  Reply to topic
 MQ xmit channel failure from hard NW break... « View previous topic :: View next topic » 
Author Message
hguapluas
PostPosted: Fri Aug 26, 2005 10:01 am    Post subject: MQ xmit channel failure from hard NW break... Reply with quote

Centurion

Joined: 05 Aug 2004
Posts: 105
Location: San Diego

Hi all,

Just wondering if anybody else has run into a situation similar to below. Have run into this a few times now and pattern for restoration seems to be same in each case.

In a large multi-network environment running through two or more firewalls when there is a firewall shutdown or other hard failure in network connectivity, when connections are restored the xmit channels on one or both ends will not reconnect and resume traffic flow. What is usually required is manual intervention on one or both sides, usually on the side where messages are getting queued up. MQ guy has to bounce the xmit channel several times and usually send a fresh (new) message and then channel status goes active and traffic flow resumes. Just doing a simple channel start on one or both ends does not resume traffic flow. It always seems to take a series of channel starts on the sender channel to resume traffic flow.

In all cases, there is an associated complete network outage tied in with this between the sender/receiver. Other channels on either end that are on network paths that were not affected by the outage continue as normal. It is only the channels that were impacted by the hard network break.

MQ vers on Windows side is 5.3 CSD's range from 06-09. On mainframe side, they are using zOS and not sure what version of MQ but it is an older version. In almost all cases of this, it was an issue of Mainframe to Windows connectivity. I do not think this has happened on any Windows-to-Windows MQ connections which would have all been at v5.3.

Some of the outages were caused by hardware failure. Others were caused by decision to shutdown firewall connections suddenly to prevent spread of recent worm attack from propagating between networks. The worm itself did not impact MQ traffic.

Curious minds want to know your experiences and what you've done to remedy the situation. Is there a more elegant way to restore service short of manually bouncing the sender channel multiple times?!?

Thanks.
Back to top
View user's profile Send private message
wschutz
PostPosted: Fri Aug 26, 2005 10:09 am    Post subject: Reply with quote

Jedi Knight

Joined: 02 Jun 2005
Posts: 3316
Location: IBM (retired)

I assume you're saying the sender end of the channel goes into (or remains in) retry? Have the MQ guys looked at the MQ logs on windows and zOS (xxxxCHIN log)? Are there any interesting messages?
_________________
-wayne
Back to top
View user's profile Send private message Send e-mail AIM Address
jefflowrey
PostPosted: Fri Aug 26, 2005 10:11 am    Post subject: Reply with quote

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

In all cases that I have ever seen, when a channel has failed to start, an error message has been produced that described why that channel failed to start.

In all cases that I have ever seen, that error message was usually a very good hint if not a complete description of what needed to be done to solve the problem.

In no case that I know of, with one exception, would restarting a channel multiple times cause the channel to start working on it's own.

The one exception to this is incomplete recovery from a network failure - and the error message would indicate that the network wasn't up yet.
_________________
I am *not* the model of the modern major general.
Back to top
View user's profile Send private message
hguapluas
PostPosted: Fri Aug 26, 2005 10:35 am    Post subject: Reply with quote

Centurion

Joined: 05 Aug 2004
Posts: 105
Location: San Diego

I was third party to most of this except for one instance where it was my receiver channel on my side. In my logs, no errors were shown and I was told that the sender channels were going into a "bind" state on the other side and would not complete the connection. After they cycled their xmit channel a couple of times, the channel started successfully and all messages backlogged in the queue were transmitted.

Afraid I don't have any more info than that.
Back to top
View user's profile Send private message
wschutz
PostPosted: Fri Aug 26, 2005 11:01 am    Post subject: Reply with quote

Jedi Knight

Joined: 02 Jun 2005
Posts: 3316
Location: IBM (retired)

So they senders went into bind and remained there until a manual stop / start of the channel? And are you saying there was nothing in the log on the senders end (or you don't know)?
_________________
-wayne
Back to top
View user's profile Send private message Send e-mail AIM Address
fjb_saper
PostPosted: Fri Aug 26, 2005 11:41 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

We had a similar problem with MQ 5.3 on Unix and MQ 2.1 on MF.
After a network interruption we would see the channel in retry mode for way longer than the retry period.

The culprit was the MF end of the channel (receiver) which seemed to believe it was still connected.
Don't rememember what the logs said.
The way we resolved it was:
shut down sender channel. (Unix)
shut down (force) receiver channel. (MF)
start receiver channel. (MF)
start sender channel. (Unix)

Enjoy
Back to top
View user's profile Send private message Send e-mail
sradiraju
PostPosted: Fri Aug 26, 2005 1:55 pm    Post subject: Reply with quote

Apprentice

Joined: 08 Sep 2002
Posts: 34
Location: Chicago,IL

Fjb_saper's solution is the right one and will prevent multiple restarts of the channels to get the communication working. However, there is a much elegant solution.

hguapluas, if you have carefully observed the error usually occur when network breaks between MQ on distributed servers ( windows or UNIX) & Mainframe. The solution is to use right combination of Heartbeat & DISCNT intervals and most importantly you need to set a parameter on mainframe called AdoptNewMCA = YES. This will enable the receiver to adopt the new incoming MCA from the sender channel after the DISCNT expires.

Hope this helps.

SOmesh
Back to top
View user's profile Send private message MSN Messenger
PeterPotkay
PostPosted: Fri Aug 26, 2005 2:48 pm    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

sradiraju wrote:
This will enable the receiver to adopt the new incoming MCA from the sender channel after the DISCNT expires.

AdoptMCA and AdoptNewMCA will kick in when required regardless of DISCINT. The RCVR does not have to wait for it to pass.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
hguapluas
PostPosted: Fri Aug 26, 2005 2:51 pm    Post subject: Reply with quote

Centurion

Joined: 05 Aug 2004
Posts: 105
Location: San Diego

Thanks for your feedback on this. Fjb_saper's description is about what I remember. I don't have any details on the mainframe side since that is controlled by a different organization and I have no influence on how they decide to configure their channels Can only respond on my end with best efforts to restore connection unless able to get a hold of their support staff and work together on it. And they frequently refuse to admit a problem exists on their end since according to their monitoring tools, the "queue" is connected. They don't always dig down into the problem to find out the root cause. Their reliance on old troubleshooting scripts can be a royal pain sometimes.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » General IBM MQ Support » MQ xmit channel failure from hard NW break...
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.