ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » Channel in-doubt

Post new topic  Reply to topic Goto page 1, 2  Next
 Channel in-doubt « View previous topic :: View next topic » 
Author Message
BBM
PostPosted: Tue Aug 16, 2011 7:16 am    Post subject: Channel in-doubt Reply with quote

Master

Joined: 10 Nov 2005
Posts: 217
Location: London, UK

Hi,

We are testing with an external party and the sender receiver channel pair between us is intermittently not passing any data.

We went to the lengths of deleting and re-creating the channel pair which enabled the sender end to send us 4 messages. We also noticed that the channel is intermittently going into in-doubt status.

We have set the sender channel HBBATCHB to 30000, then 5000 but this makes no difference. The channel at both ends is showing as running but the sequence numbers have a discrepancy which is the same as the number of messages that haven't been delivered. The channel status also shows that 0 messages have been sent.

I'm wondering what to try next, I'm guessing it must be network related as we can send in the other direction with no issues.

Any ideas?
Back to top
View user's profile Send private message
mqjeff
PostPosted: Tue Aug 16, 2011 7:32 am    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

I would confirm that the other end is not doing something like trying to use BigIP to load balance between qmgrs.
Back to top
View user's profile Send private message
Mr Butcher
PostPosted: Tue Aug 16, 2011 11:11 pm    Post subject: Reply with quote

Padawan

Joined: 23 May 2005
Posts: 1716

If both channels ends are running, but channel is indoubt and data is not passing through (except the first batch of messages which is indoubt), then it could be that the receiving channel can not deliver the messages and is in retry processing. so i would check the retry parameters of the receiving channel and remove them. make sure a dlq is defined for the receiving queuemanager. then try again and check if anything hits the dlq, and if so, check why.
_________________
Regards, Butcher
Back to top
View user's profile Send private message
gbaddeley
PostPosted: Wed Aug 17, 2011 1:19 am    Post subject: Reply with quote

Jedi Knight

Joined: 25 Mar 2003
Posts: 2538
Location: Melbourne, Australia

Look at the MQ error logs and the output of DIS CHS(xx) ALL on both sides. This will give you more information about why there is an indoubt situation, and determines what you need to do to correct it.
_________________
Glenn
Back to top
View user's profile Send private message
BBM
PostPosted: Wed Aug 17, 2011 5:12 am    Post subject: Reply with quote

Master

Joined: 10 Nov 2005
Posts: 217
Location: London, UK

Thanks everyone. I changed the retry values as per Mr Butcher's suggestion, which did not correct the issue.

The sending end is not clustered or using BIGIP.

Here is the receiving end's CHS - note that it says that the channel is not indoubt but the senders CHS states it is indoubt.

CHANNEL(SENDQM.TO.RCVQM) CHLTYPE(RCVR)
BATCHES(0) BATCHSZ(50)
BUFSRCVD(3) BUFSSENT(3)
BYTSRCVD(356) BYTSSENT(340)
CHSTADA(2011-08-17) CHSTATI(14.02.43)
COMPHDR(NONE,NONE) COMPMSG(NONE,NONE)
COMPRATE(0,0) COMPTIME(0,0)
CONNAME(x.x.x.x) CURLUWID(4E3D201D10024901)
CURMSGS(0) CURRENT
CURSEQNO(39) EXITTIME(0,0)
HBINT(300) INDOUBT(NO)
JOBNAME(0000414B00075AC5) LOCLADDR(::ffff:x.x.x.x(1540))
LSTLUWID(4E3D201D10024901) LSTMSGDA( )
LSTMSGTI( ) LSTSEQNO(39)
MCASTAT(RUNNING) MCAUSER(mqm)
MONCHL(OFF) MSGS(0)
NPMSPEED(FAST) RQMNAME(SENDQM)
SSLCERTI( ) SSLKEYDA( )
SSLKEYTI( ) SSLPEER( )
SSLRKEYS(0) STATUS(RUNNING)
STOPREQ(NO) SUBSTATE(RECEIVE)
XBATCHSZ(0,0)


I am running out of the things to try, we changed the discint to 300 to try and 'refresh' the conenction every 5 mins rather than leaving the channel up for an hour or more. This didn't make any differerence, it still goes into indoubt after sending a few messages successfully.

There is nothing in the error logs - as far as MQ is concerned the channel is running normally.
Back to top
View user's profile Send private message
mqjeff
PostPosted: Wed Aug 17, 2011 5:21 am    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

There has to be something in the sender's MQ error logs.
Back to top
View user's profile Send private message
Mr Butcher
PostPosted: Wed Aug 17, 2011 5:37 am    Post subject: Reply with quote

Padawan

Joined: 23 May 2005
Posts: 1716

just to make it clear .... my suggestion was not to correct the problem, but to find it, as - when the channel is in retry - no error messages are written to the logs until all retries are performed.

however, as already suggested, check the logs on both ends.

what mq versions on sending and receiving queuemanagers?
_________________
Regards, Butcher
Back to top
View user's profile Send private message
BBM
PostPosted: Wed Aug 17, 2011 6:12 am    Post subject: Reply with quote

Master

Joined: 10 Nov 2005
Posts: 217
Location: London, UK

Yep apologies Mr Butcher, we didn't get anything on the DLQ after the retry values were changed.

The versions are: sender 6.x (will find out exact version), and at our end 7.0.1.1.

I've just been speaking to the network team on our side and apparently they have managed to capture a TCP/IP anomaly in their trace.

I asked the sender side to re-examine their logs and they came up with the following error which correlates with our network team trace - I think we have found this issue. Strange thing is that on our side we had nothing in the logs and channel is running happily.

Many thanks!

17/08/2011 13:37:52 - Process(21964.1) User(mqm) Program(runmqchl_nd)
AMQ9209: Connection to host 'host (x.x.x.x)' closed.

EXPLANATION:
An error occurred receiving data from 'host (x.x.x.x)' over TCP/IP.
The connection to the remote host has unexpectedly terminated.
ACTION:
Tell the systems administrator.
----- amqccita.c : 3094 -------------------------------------------------------
17/08/2011 13:37:52 - Process(21964.1) User(mqm) Program(runmqchl_nd)
AMQ9999: Channel program ended abnormally.

EXPLANATION:
Channel program 'SENDQM.TO.RCVQM' ended abnormally.
ACTION:
Look at previous error messages for channel program 'SENDQM.TO.RCVQM' in
the error files to determine the cause of the failure.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Wed Aug 17, 2011 8:22 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

You might want to upgrade your 7.0.1.1 to 7.0.1.6 (latest) or to at least 7.0.1.3 (GA)
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
bruce2359
PostPosted: Wed Aug 17, 2011 8:41 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9469
Location: US: west coast, almost. Otherwise, enroute.

BBM wrote:
Yep apologies Mr Butcher, we didn't get anything on the DLQ after the retry values were changed.

Have you defined a DLQ on the destination qmgr? Have you altered the qmgr object to tell the qmgr the name of the DLQ?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
gbaddeley
PostPosted: Wed Aug 17, 2011 8:13 pm    Post subject: Reply with quote

Jedi Knight

Joined: 25 Mar 2003
Posts: 2538
Location: Melbourne, Australia

The RCVR appears to be still in RUNNING status, in a RECEIVE state. It probably doesn't know that the SDR has died because it is sitting on a blocked TCP socket read.

I suggest that you STOP and then START the RCVR. It will either go into INACTIVE status (ie. DIS CHS() CURRENT shows there is no status info), or it will go into RUNNING, RETRY on INDOUBT status

At the far end, after starting the SDR, it may be in RETRY or INDOUBT status.

Check the latest messages in the error logs on both sides. Take appropriate action, such as RESOLVE ... BACKOUT, or RESET ... SEQNUM.

Its fairly safe to do a RESOLVE ... BACKOUT on the SDR side, and then RESET SEQNUM(1) on the SDR (the RCVR will then start at 1 again too).
_________________
Glenn
Back to top
View user's profile Send private message
BBM
PostPosted: Thu Aug 18, 2011 1:14 am    Post subject: Reply with quote

Master

Joined: 10 Nov 2005
Posts: 217
Location: London, UK

Hi Bruce, yep DLQ was defined and qmgr was configured with name of the DLQ.

I've seen in-doubt channels in the past but this is the first time I hadn't immediately been able to pin it down to one thing or another.

When the issue occurs (which was 50% of the time) we were able to resolve the channel and then try again - sometimes successfully sometimes not.

The problem was it was happening way too often so needed manual intervention all the time which is not acceptable. Stopping the channel at either end and resetting sequence numbers usually did not cure the issue.

We found after a lot of testing the issue is less likely to occur with small messages. When we looked at the network trace we discovered that the sending end was not responding to TCP/IP SACKs (Selective ACK messages) which are TCP/IP requests to re-transmit part of a packet.

The sending end did not respond to these SACKs and carried on sending regardless, this is what caused the channel to go in-doubt. We are now looking into why this is happening and the evidence is that it's it an issue with the NIC/NIC driver on the sending end.

Thanks again for all the suggestions.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Thu Aug 18, 2011 1:34 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

BBM wrote:
Hi Bruce, yep DLQ was defined and qmgr was configured with name of the DLQ.

I've seen in-doubt channels in the past but this is the first time I hadn't immediately been able to pin it down to one thing or another.

When the issue occurs (which was 50% of the time) we were able to resolve the channel and then try again - sometimes successfully sometimes not.

The problem was it was happening way too often so needed manual intervention all the time which is not acceptable. Stopping the channel at either end and resetting sequence numbers usually did not cure the issue.

We found after a lot of testing the issue is less likely to occur with small messages. When we looked at the network trace we discovered that the sending end was not responding to TCP/IP SACKs (Selective ACK messages) which are TCP/IP requests to re-transmit part of a packet.

The sending end did not respond to these SACKs and carried on sending regardless, this is what caused the channel to go in-doubt. We are now looking into why this is happening and the evidence is that it's it an issue with the NIC/NIC driver on the sending end.

Thanks again for all the suggestions.

Wrong conclusion.
Again check out the version # of your MQ. And remember that for V7.0.1.x the GA version is V7.0.1.3. Your version precedes that. My suggestion is upgrade to V7.0.1.6 and verify if the problem is still happening.

I would not be surprised if the problem has been fixed since.
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
BBM
PostPosted: Thu Aug 18, 2011 2:25 am    Post subject: Reply with quote

Master

Joined: 10 Nov 2005
Posts: 217
Location: London, UK

Hi fjb_saper

I'm willing to be wrong on the network thing, but we have 18 queue managers on this box all running fine. We also have 12 other external parties conencting into the queue manager that are not experiencing this issue.

Can you elaborate on why you think it might not be network infrastructure and an upgrade would fix it?

Thanks
Back to top
View user's profile Send private message
BBM
PostPosted: Thu Aug 18, 2011 3:41 am    Post subject: Reply with quote

Master

Joined: 10 Nov 2005
Posts: 217
Location: London, UK

By the way, the sending end is on 6.0.2.2
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Goto page 1, 2  Next Page 1 of 2

MQSeries.net Forum Index » General IBM MQ Support » Channel in-doubt
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.