Author |
Message
|
noles321 |
Posted: Mon Nov 15, 2004 6:22 am Post subject: Sender Channel Stays running |
|
|
Apprentice
Joined: 15 Nov 2004 Posts: 43
|
Recently my sender channel from an AIX box stay running when the remote system goes down. We do have TCP keepalive working. The remote system that I am connecting to is TPF and does not support HBINT. Any suggestions we currently have an unlimited disconnect interval as our coverage staff doesn't want the channels to ever come down.
Thanks,
Jeramy |
|
Back to top |
|
 |
vennela |
Posted: Mon Nov 15, 2004 7:12 am Post subject: |
|
|
 Jedi Knight
Joined: 11 Aug 2002 Posts: 4055 Location: Hyderabad, India
|
I recently had a same problem. IBM has sent me a fix. You should probably open a PMR. |
|
Back to top |
|
 |
Nigelg |
Posted: Mon Nov 15, 2004 7:14 am Post subject: |
|
|
Grand Master
Joined: 02 Aug 2004 Posts: 1046
|
What is the problem with the channel staying in RUNNING?
If there are msgs to move, the channel will discover the remote end is not available and go into RETRYING. If the msgs are persistent (or NPMSPEED is NORMAL) they will not be lost, but will remain on the indoubt channel until the remote qmgr starts up again. |
|
Back to top |
|
 |
vennela |
Posted: Mon Nov 15, 2004 7:28 am Post subject: |
|
|
 Jedi Knight
Joined: 11 Aug 2002 Posts: 4055 Location: Hyderabad, India
|
Nigel:
I thought it was you who helped me recreate the problem when I opened the PMR |
|
Back to top |
|
 |
noles321 |
Posted: Mon Nov 15, 2004 7:41 am Post subject: running |
|
|
Apprentice
Joined: 15 Nov 2004 Posts: 43
|
The sender channel stays running and does not go into a retrying state. The remote side is down waiting for the connection to re-establish. MQ Thinks all is well and just stays that way...
-- Jeramy
Quote: |
The system that the AIX box is connecting to is TPF which is a loosly coupled IBM mainframe for a volume transactions, like reservation systems. When one of the CPU's in the complex is IPL'd the AIX box stays runnings once the remote Mainframe comes back and the queue manager is started the channel from AIX never reconnects because he thinks he is still running.
thanks,
jeramy |
|
|
Back to top |
|
 |
vennela |
Posted: Mon Nov 15, 2004 8:13 am Post subject: |
|
|
 Jedi Knight
Joined: 11 Aug 2002 Posts: 4055 Location: Hyderabad, India
|
What happens when you try to stop the running channel.
If it goes into STOPPING state then that is the symptom of the problem that I am talking about.
Can you try stopping the channel and see what happens.
PS: I deleted your other post because it is the same thing. I have posted your message above. |
|
Back to top |
|
 |
noles321 |
Posted: Mon Nov 15, 2004 8:19 am Post subject: When Stopping the Channel |
|
|
Apprentice
Joined: 15 Nov 2004 Posts: 43
|
When I stop the channel it comes down as it should and when I re-start it the channel comes up fine. |
|
Back to top |
|
 |
vennela |
Posted: Mon Nov 15, 2004 8:22 am Post subject: |
|
|
 Jedi Knight
Joined: 11 Aug 2002 Posts: 4055 Location: Hyderabad, India
|
Sorry:
Misunderstood your question.
Now, did you say your DISCINT is 0 or what is it? |
|
Back to top |
|
 |
Nigelg |
Posted: Mon Nov 15, 2004 8:24 am Post subject: |
|
|
Grand Master
Joined: 02 Aug 2004 Posts: 1046
|
OK, so this is not the same as Vennela's fix.
So, what is the problem then?
When a msg has to be sent the chl will find out that the RCVR is not running, then stop and retry after SHORTTMR (60 seconds), and start a new SDR channel. The msg will move and the chl will run. |
|
Back to top |
|
 |
noles321 |
Posted: Mon Nov 15, 2004 8:26 am Post subject: Discint |
|
|
Apprentice
Joined: 15 Nov 2004 Posts: 43
|
The Discint is 0, we need these channels to stay up all the time. We pass real time transactional data.
-- jeramy |
|
Back to top |
|
 |
noles321 |
Posted: Mon Nov 15, 2004 8:28 am Post subject: Nigel That is the problem |
|
|
Apprentice
Joined: 15 Nov 2004 Posts: 43
|
The Channel never seems to find out the receiver is not running and stop and restart. It just stays in a running status and messages queue up on the XMITQ. |
|
Back to top |
|
 |
vennela |
Posted: Mon Nov 15, 2004 8:29 am Post subject: |
|
|
 Jedi Knight
Joined: 11 Aug 2002 Posts: 4055 Location: Hyderabad, India
|
Do you have AdoptNewMCA in your qm.ini? |
|
Back to top |
|
 |
noles321 |
Posted: Mon Nov 15, 2004 8:49 am Post subject: no |
|
|
Apprentice
Joined: 15 Nov 2004 Posts: 43
|
AdoptNewMCA is not yet supported on TPF... But how would that work if the Sender channel is the one that thinks he is still working. Doesn't AdoptNewMCA only work when a new connection is requested by the sender channel.
thanks |
|
Back to top |
|
 |
fjb_saper |
Posted: Mon Nov 15, 2004 11:31 am Post subject: Re: Discint |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
noles321 wrote: |
The Discint is 0, we need these channels to stay up all the time. We pass real time transactional data.
-- jeramy |
Would it be acceptable (latency delays) to have a triggered channel ? |
|
Back to top |
|
 |
Nigelg |
Posted: Tue Nov 16, 2004 1:30 am Post subject: |
|
|
Grand Master
Joined: 02 Aug 2004 Posts: 1046
|
A SDR channel in RUNNING state is normally doing one of two things.
EITHER
It is waiting in MQGET for a msg to arrive on the xmitq
OR
It is blocked in a TCP API call waiting for the call to complete, or in select() waiting for the socket to become available for writing or reading.
This is not 100% true; there are some other abnormal circumstances.
In your case, the channel is not in MQGET because there are msgs on the xmitq. So, it is probably in a TCP call. You can check what the MCA is doing by sending the process a SIGUSR2, i.e. kill -USR2 PID_OF_MCA. This will cause an FDC to be dumped of all the threads in the process, with the call stack of each thread, so you will be able to see what each thread is doing.
TCP should have returned an error when the remote channel was ended and the socket closed. If it did not, then there is nothing that the sender can do except wait for the TCP call to complete or time out. The time out may be after about 360 seconds (the internal WMQ timeout for the select to complete), or after the KeepAlive interval when TCP will return ETIMEDOUT to the TCP call. |
|
Back to top |
|
 |
|