Author |
Message
|
tczielke |
Posted: Wed Mar 07, 2018 7:29 am Post subject: S.C.T.Q. not processing messages for S.I.Q.C. |
|
|
Guardian
Joined: 08 Jul 2010 Posts: 941 Location: Illinois, USA
|
We have an issue where we have two queue managers where messages are very slowly building on the SYSTEM.CLUSTER.TRANSMIT.QUEUE (S.C.T.Q.) and are not being processed. These messages are destined for the SYSTEM.INTER.QMGR.CONTROL (S.I.Q.C.) for queue managers that these two queue managers have connectivity to send the messages to. For whatever reason, these messages just sit on the S.C.T.Q. For one queue manager, the messages have been there for a few months.
Has anyone seen this issue? Or does anyone have any ideas why this behavior is happening?
The two queue managers that are having this issue are at either 7.5.0.6 or 8.0.0.6.
I do have a PMR open, but was just curious if anyone else has seen this issue or has an idea why it is happening. _________________ Working with MQ since 2010. |
|
Back to top |
|
 |
tczielke |
Posted: Wed Mar 07, 2018 9:32 am Post subject: |
|
|
Guardian
Joined: 08 Jul 2010 Posts: 941 Location: Illinois, USA
|
Here are some more details:
This is what is happening from a channel stand point. Let’s say it is QM1 that has the messages on the S.C.T.Q. that want to go to QM2 for the S.I.Q.C. queue. QM1 and QM2 are both in cluster X. QM2 actually has two CLUSRCVR channels for cluster X. CLUSRCVR1 is for an internal network and CLUSRCVR2 is for an external network. QM1 is in an internal network and does not have the connectivity to talk to QM2 over CLUSRCVR2, but it can talk to QM2 over CLUSRCVR1. When the issue started, QM1 tried to start up CLUSRCVR2 to send these messages, which it could not use since QM1->QMG2 over CLUSRCVR2 is not allowed. I then got an alert about the CLUSRCVR2 channel on QM1 being in a retrying state, so I stopped that channel. However, QM1 is still not sending the messages to QM2 (for months now for one of the queue managers), even though it has a valid route with CLUSRCVR1 to send the message. This looks like a potential bug to me, but curious if anyone else has seen this behavior. _________________ Working with MQ since 2010. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Wed Mar 07, 2018 5:08 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Look at the Correlation ID of the messages waiting in the S.C.T.Q.
Is it the channel name that you have stopped, that does not have the ability to get from QM1 to QM2?
If yes, the messages will sit forever because the other channel that is running ignores those messages. It sees the Correl ID is NOT the name of itself, and thus knows those messages are not for it to move. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
tczielke |
Posted: Thu Mar 08, 2018 10:56 am Post subject: |
|
|
Guardian
Joined: 08 Jul 2010 Posts: 941 Location: Illinois, USA
|
Thanks, Peter. Yes, this the case. The CorrelId has the channel name of the stopped channel. I am surprised that MQ does not realize in this case that the one channel is stopped and there is another open cluster channel that it can use to send the messages to QM1->QM2. _________________ Working with MQ since 2010. |
|
Back to top |
|
 |
Vitor |
Posted: Thu Mar 08, 2018 11:01 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
tczielke wrote: |
I am surprised that MQ does not realize in this case that the one channel is stopped |
Why? MQ assumes that channels are self healing and/or will be manually healed in a timely manner. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
tczielke |
Posted: Thu Mar 08, 2018 11:06 am Post subject: |
|
|
Guardian
Joined: 08 Jul 2010 Posts: 941 Location: Illinois, USA
|
It does not seem intuitive to me. I would think that MQ would realize after a little bit that channel 1 is now no longer available (stopped) and channel 2 is open and going to the same destination so reroute the message through channel 2. _________________ Working with MQ since 2010. |
|
Back to top |
|
 |
Vitor |
Posted: Thu Mar 08, 2018 11:25 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
tczielke wrote: |
channel 2 is open and going to the same destination so reroute the message through channel 2. |
How would it know it's the same destination? The network routing (which is what the channel MCA knows) must be different as (in your example) it's pointing to an externally hosted network. In MQ terms you typically define 2 different channels over 2 different networks because you want to segregate traffic (high priority over fiber, batch over copper is the classic example) and it might actually cause SLA breach / extra costs if the system "fixes" one problem by dumping everything into one network.
Obviously this doesn't hold if you simply have 2 paths for resiliency (which you seem to have) and if you raised an RFE I'd vote for it. I'd personally settle for a convenient utility to push stranded messages back through the cluster workload algorithm again. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
tczielke |
Posted: Thu Mar 08, 2018 11:30 am Post subject: |
|
|
Guardian
Joined: 08 Jul 2010 Posts: 941 Location: Illinois, USA
|
Should have said "and channel 2 is open and going to the same queue manager as channel 1, so reroute the message through channel 2" _________________ Working with MQ since 2010. |
|
Back to top |
|
 |
mvic |
Posted: Fri Mar 09, 2018 3:38 pm Post subject: |
|
|
 Jedi
Joined: 09 Mar 2004 Posts: 2080
|
tczielke wrote: |
It does not seem intuitive to me. I would think that MQ would realize after a little bit that channel 1 is now no longer available (stopped) and channel 2 is open and going to the same destination so reroute the message through channel 2. |
Having identified channel1 as the route for the message, the local cluster code can change its mind, but only if the message was put via a bind-not-fixed object handle.
Still, I can't immediately say what causes the change of mind to be considered. I think the channel has to be in retry, for that to happen, but I could be wrong.
Anyway, if your message was put via bind-on-open, MQ will never reconsider its first routing decision.
By the way, this config where a CLUSRCVR is advertised by a QM but architecturally cannot be a valid route to that QM, is testing the boundaries of the design and intentions of clustering, IMHO. |
|
Back to top |
|
 |
tczielke |
Posted: Fri Mar 09, 2018 7:48 pm Post subject: |
|
|
Guardian
Joined: 08 Jul 2010 Posts: 941 Location: Illinois, USA
|
mvic wrote: |
By the way, this config where a CLUSRCVR is advertised by a QM but architecturally cannot be a valid route to that QM, is testing the boundaries of the design and intentions of clustering, IMHO. |
Yes, I do now see that this configuration is flawed. I am working on changing it. Unfortunately, when I designed it, it did not test the boundaries of my preconceived notions or vivid imagination.  _________________ Working with MQ since 2010. |
|
Back to top |
|
 |
|