Author |
Message
|
gsbureau |
Posted: Mon Dec 03, 2018 3:58 am Post subject: RECEIVER Channel PAUSED State |
|
|
Newbie
Joined: 24 Oct 2014 Posts: 4
|
Hi ALL,
We had a situation where the SENDER channel was in RUNNING but the RECEIVER channel was in PAUSED state.
We observed one of the queue was full at receiving queue manager and once we increase the MAXDEPTH of the queue the RECEIVER channel started running fine.
Yes, if the destination Queue and the DLQ both are full then the RECEIVER channel can go to PAUSED state.
But in our case only 184 messages moved to DLQ and later channel moved to PAUSED state. Messages were getting piled up in the source queue manager XMITQ.
Below is our DLQ depth and message length settings.
DLQ - MAXDEPTH - 999999999
DLQ - MAXMSGL - 104857600
I have tried to find similar situation in multiple forums, but couldn't find an answer. Please let me know what might have caused the RECEIVER channel to move to PAUSED state and why messages didn't moved to DLQ as the destination queue is full.
Many Thanks in Advance. Please let me know if you need any more details. |
|
Back to top |
|
 |
exerk |
Posted: Mon Dec 03, 2018 4:12 am Post subject: |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
Does THIS apply? Specifically the second bullet point. Also, THIS one may be of help to you. _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
fjb_saper |
Posted: Mon Dec 03, 2018 6:08 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
exerk wrote: |
Does THIS apply? Specifically the second bullet point. Also, THIS one may be of help to you. |
The Paused state of the channel has to do with the short and long retry intervals on the receiver channel.
Before the message can be put to the DLQ the short and long retry must be exhausted.  _________________ MQ & Broker admin |
|
Back to top |
|
 |
gsbureau |
Posted: Mon Dec 03, 2018 8:07 pm Post subject: retry settings |
|
|
Newbie
Joined: 24 Oct 2014 Posts: 4
|
fjb_saper wrote: |
exerk wrote: |
Does THIS apply? Specifically the second bullet point. Also, THIS one may be of help to you. |
Yes agree! The Paused state of the channel has to do something with the short and long retry intervals on the receiver channel.
Before the message can be put to the DLQ the short and long retry must be exhausted.  |
We have checked the RETRY interval on the channel too. Below are the settings on the RECEIVER channel.
Message Retry : 10
Message Retry interval : 1000 (1 second)
Batch Size : 50
Based on the above setting for a single batch the total time it retry and move to DLQ is 500 seconds.
50 count * 1 second * 10 retry = 500 seconds.
But, we could see the messages were sitting in the XMITQ at source end for more than 1 hour.
Really a strange situation, no errors/information in the MQ error log.
We are trying to find more clues on this. Will share if we found the root cause. |
|
Back to top |
|
 |
gsbureau |
Posted: Mon Dec 03, 2018 8:47 pm Post subject: |
|
|
Newbie
Joined: 24 Oct 2014 Posts: 4
|
exerk wrote: |
Does THIS apply? Specifically the second bullet point. Also, THIS one may be of help to you. |
Thanks for the links, they got very good information. We are searching for more clues on why it happened like this. |
|
Back to top |
|
 |
bruce2359 |
Posted: Tue Dec 04, 2018 4:55 am Post subject: Re: retry settings |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9472 Location: US: west coast, almost. Otherwise, enroute.
|
gsbureau wrote: |
We have checked the RETRY interval on the channel too. Below are the settings on the RECEIVER channel.
Message Retry : 10
Message Retry interval : 1000 (1 second)
Batch Size : 50
Based on the above setting for a single batch the total time it retry and move to DLQ is 500 seconds.
50 count * 1 second * 10 retry = 500 seconds.
But, we could see the messages were sitting in the XMITQ at source end for more than 1 hour. |
Which RETRY? There are two (2) pairs, namely: SHORT and LONG.
What error(s) did you find in the error log for this RECEIVER end queue manager?
What error(s) did you find in the error log for the SENDER end queue manager?
From https://www.ibm.com/support/knowledgecenter/en/SSFKSJ_7.5.0/com.ibm.mq.tro.doc/q114740_.htm
Quote: |
An error scenario may occur that is difficult to recognize. For example, the link and channel may be functioning perfectly, but some occurrence at the receiving end causes the receiver to stop. Another unforeseen situation could be that the receiver system has run out of memory and is unable to complete a transaction.
You need to be aware that such situations can arise, often characterized by a system that appears to be busy but is not actually moving messages. You need to work with your counterpart at the far end of the link to help detect the problem and correct it. |
_________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Last edited by bruce2359 on Tue Dec 04, 2018 6:04 am; edited 1 time in total |
|
Back to top |
|
 |
fjb_saper |
Posted: Tue Dec 04, 2018 5:42 am Post subject: Re: retry settings |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
gsbureau wrote: |
We have checked the RETRY interval on the channel too. Below are the settings on the RECEIVER channel.
Message Retry : 10
Message Retry interval : 1000 (1 second)
Batch Size : 50
Based on the above setting for a single batch the total time it retry and move to DLQ is 500 seconds.
50 count * 1 second * 10 retry = 500 seconds.
But, we could see the messages were sitting in the XMITQ at source end for more than 1 hour.
Really a strange situation, no errors/information in the MQ error log.
We are trying to find more clues on this. Will share if we found the root cause. |
You are confusing a lot of things here.
Assuming that your destination queue is full.
Each new message for the destination queue needs to go through the retry occurrences, both short and long.
That means that if you have 30 messages for that queue the message behind message 30 will have to wait 30 times the total wait time for each message.
This is why having a full queue introduces you to the nightmare of lost SLAs so fast... So having a message in the XMITQ hours old because of a full queue is nothing that needs to surprise you... With a full queue the channel throughput goes from thousands of messages in a second to a mere trickle...  _________________ MQ & Broker admin |
|
Back to top |
|
 |
PeterPotkay |
Posted: Tue Dec 04, 2018 7:38 am Post subject: Re: retry settings |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
bruce2359 wrote: |
gsbureau wrote: |
We have checked the RETRY interval on the channel too. Below are the settings on the RECEIVER channel.
Message Retry : 10
Message Retry interval : 1000 (1 second)
Batch Size : 50
Based on the above setting for a single batch the total time it retry and move to DLQ is 500 seconds.
50 count * 1 second * 10 retry = 500 seconds.
But, we could see the messages were sitting in the XMITQ at source end for more than 1 hour. |
Which RETRY? There are two (2) pairs, namely: SHORT and LONG.
|
On a receiver channel which is the end being discussed there is only one retry parameter. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
PeterPotkay |
Posted: Tue Dec 04, 2018 7:40 am Post subject: Re: retry settings |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
fjb_saper wrote: |
gsbureau wrote: |
We have checked the RETRY interval on the channel too. Below are the settings on the RECEIVER channel.
Message Retry : 10
Message Retry interval : 1000 (1 second)
Batch Size : 50
Based on the above setting for a single batch the total time it retry and move to DLQ is 500 seconds.
50 count * 1 second * 10 retry = 500 seconds.
But, we could see the messages were sitting in the XMITQ at source end for more than 1 hour.
Really a strange situation, no errors/information in the MQ error log.
We are trying to find more clues on this. Will share if we found the root cause. |
You are confusing a lot of things here.
Assuming that your destination queue is full.
Each new message for the destination queue needs to go through the retry occurrences, both short and long.
|
The SNDR's Short and Long Retry have no bearing on how often the RCVR retries its deliveries because of a problem (e.g. q full) delivering on the receiving end.
gsbureau math is correct. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
PeterPotkay |
Posted: Tue Dec 04, 2018 7:44 am Post subject: Re: retry settings |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
gsbureau wrote: |
fjb_saper wrote: |
exerk wrote: |
Does THIS apply? Specifically the second bullet point. Also, THIS one may be of help to you. |
Yes agree! The Paused state of the channel has to do something with the short and long retry intervals on the receiver channel.
Before the message can be put to the DLQ the short and long retry must be exhausted.  |
We have checked the RETRY interval on the channel too. Below are the settings on the RECEIVER channel.
Message Retry : 10
Message Retry interval : 1000 (1 second)
Batch Size : 50
Based on the above setting for a single batch the total time it retry and move to DLQ is 500 seconds.
50 count * 1 second * 10 retry = 500 seconds.
But, we could see the messages were sitting in the XMITQ at source end for more than 1 hour.
|
500 seconds is almost 1 hour.
The 50th message in the first batch waited almost an hour.
Messages in subsequent batches have to wait for ALL other messages ahead of them, including messages in previous batches if the batches are queuing up in the XMITQ.
Messages looking to go to queues unrelated to the full queue still have to wait. A Message-Retrying RCVR channel on a highly shared queue manager can really, really cause a lot of confusion as unrelated transaction are delayed, app supports team say "the MQ is slow" and for once they are right! _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
gsbureau |
Posted: Wed Dec 05, 2018 1:44 am Post subject: Re: retry settings |
|
|
Newbie
Joined: 24 Oct 2014 Posts: 4
|
bruce2359 wrote: |
gsbureau wrote: |
We have checked the RETRY interval on the channel too. Below are the settings on the RECEIVER channel.
Message Retry : 10
Message Retry interval : 1000 (1 second)
Batch Size : 50
Based on the above setting for a single batch the total time it retry and move to DLQ is 500 seconds.
50 count * 1 second * 10 retry = 500 seconds.
But, we could see the messages were sitting in the XMITQ at source end for more than 1 hour. |
Which RETRY? There are two (2) pairs, namely: SHORT and LONG.
What error(s) did you find in the error log for this RECEIVER end queue manager?
What error(s) did you find in the error log for the SENDER end queue manager?
From https://www.ibm.com/support/knowledgecenter/en/SSFKSJ_7.5.0/com.ibm.mq.tro.doc/q114740_.htm
Quote: |
An error scenario may occur that is difficult to recognize. For example, the link and channel may be functioning perfectly, but some occurrence at the receiving end causes the receiver to stop. Another unforeseen situation could be that the receiver system has run out of memory and is unable to complete a transaction.
You need to be aware that such situations can arise, often characterized by a system that appears to be busy but is not actually moving messages. You need to work with your counterpart at the far end of the link to help detect the problem and correct it. |
|
Hi,
The Message retry settings i have shared are the RECEIVER channel settings.
CHLTYPE(RCVR) MRRTY(10) MRTMR(1000)
Below is the sender channel retry settings.
CHLTYPE(SDR) LONGRTY(999999999) LONGTMR(1200) SHORTRTY(180) SHORTTMR(60).
If it is due to RETRY setting why 185 messages moved to DLQ at the destination queue manager and channel went into PAUSED state? This question for which i am trying to find the reason.
Below are the timings which can help to understand the situation:
1) 3:05AM --> queue Full
2) 3:05AM to 03:10AM --> 185 messages moved to DLQ.
3) From 03:10AM no messages flowing to the destination queue manager until the MAXDEPTH of the queue is increased.
4) I could see in the logs from 03:16AM the Sender channel was getting timeout for every 6 minutes(360 seconds) and auto restarting.
Below are the errors I could see at both queue managers:
Source Queue Manager:
AMQ9259: Connection timed out from host 'XXXXXX(53303)'.
EXPLANATION:
A connection from host 'XXXXXXX(53303)' over TCP/IP timed out.
ACTION:
The select() [TIMEOUT] 360 seconds call timed out. Check to see why data was
not received in the expected time. Correct the problem. Reconnect the channel,
or wait for a retrying channel to reconnect itself.
----- amqccita.c : 4484 -------------------------------------------------------
Process(30942.1) User(mqm) Program(runmqchl)
Host(XXXXXXX) Installation(Installation1)
VRMF(7.5.0. QMgr(DEALS)
AMQ9999: Channel 'DEALS.GLOSSGW' to host 'XXXXX(53303)'
ended abnormally.
EXPLANATION:
The channel program running under process ID 30942 for channel 'DEALS.GLOSSGW'
ended abnormally. The host name is 'XXXXX(53303)'; in
some cases the host name cannot be determined and so is shown as '????'.
ACTION:
Look at previous error messages for the channel program in the error logs to
determine the cause of the failure. Note that this message can be excluded
completely or suppressed by tuning the "ExcludeMessage" or "SuppressMessage"
attributes under the "QMErrorLog" stanza in qm.ini. Further information can be
found in the System Administration Guide.
Destination Queue Manager Error Message:
----- amqrmrca.c : 1570 -------------------------------------------------------
11/30/2018 03:18:01 AM - Process(3953.5218) User(mqm) Program(amqrmppa)
Host(XXXXXXX) Installation(Installation1)
VRMF(7.5.0. QMgr(GLOSSGW)
AMQ9209: Connection to host 'XXXXXX (XXXXXX)' for channel
'DEALS.GLOSSGW' closed.
EXPLANATION:
An error occurred receiving data from 'XXXXX (XXXXXX)' over TCP/IP.
The connection to the remote host has unexpectedly terminated.
The channel name is 'DEALS.GLOSSGW'; in some cases it cannot be determined and
so is shown as '????'.
ACTION:
Tell the systems administrator.
----- amqccita.c : 4141 -------------------------------------------------------
11/30/2018 03:18:01 AM - Process(3953.5217) User(mqm) Program(amqrmppa)
Host(XXXXXX) Installation(Installation1)
VRMF(7.5.0. QMgr(GLOSSGW)
AMQ9528: User requested channel 'DEALS.GLOSSGW' to be stopped.
EXPLANATION:
The channel is stopping because of a request by the user.
ACTION:
None. |
|
Back to top |
|
 |
bruce2359 |
Posted: Wed Dec 05, 2018 4:55 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9472 Location: US: west coast, almost. Otherwise, enroute.
|
When this symptom occurs, are any applications misbehaving in any way? Ending other than normally?
Please post the qm.ini file contents for both sender and receiver qmgrs. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Dec 05, 2018 6:20 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
At 3:10 what was the current depth and max depth of the DLQ?
Did you have enough storage to accommodate more messages in the DLQ?
The channel behaves like it is trying to close the receiver channel. This will happen if the DLQ is full and the message is trying to get put to the DLQ.
Have fun  _________________ MQ & Broker admin |
|
Back to top |
|
 |
bruce2359 |
Posted: Wed Dec 05, 2018 6:27 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9472 Location: US: west coast, almost. Otherwise, enroute.
|
fjb_saper wrote: |
At 3:10 what was the current depth and max depth of the DLQ?
Did you have enough storage to accommodate more messages in the DLQ?
The channel behaves like it is trying to close the receiver channel. This will happen if the DLQ is full and the message is trying to get put to the DLQ.
Have fun  |
I've seen this symptom with insufficient available (circular) log space as a root cause.
Might try reducing batchsize to 1. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
gbaddeley |
Posted: Wed Dec 05, 2018 2:44 pm Post subject: |
|
|
 Jedi Knight
Joined: 25 Mar 2003 Posts: 2538 Location: Melbourne, Australia
|
Quote: |
EXPLANATION:
A connection from host 'XXXXXXX(53303)' over TCP/IP timed out.
ACTION:
The select() [TIMEOUT] 360 seconds call timed out. Check to see why data was
not received in the expected time. Correct the problem. Reconnect the channel,
or wait for a retrying channel to reconnect itself. |
This can indication an issue with the TCP stack or network. select() is a low level TCP socket call. It should not timeout under normal circumstances. _________________ Glenn |
|
Back to top |
|
 |
|