|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
 |
|
Sequence Number (Cluster to Multi Instance) Scripting Idea |
« View previous topic :: View next topic » |
Author |
Message
|
RHeunes |
Posted: Wed Jan 11, 2023 11:52 pm Post subject: Sequence Number (Cluster to Multi Instance) Scripting Idea |
|
|
Novice
Joined: 13 Sep 2021 Posts: 16
|
Good day
I have a question regarding scripting a sequence number reset on the sender channel sides of a clustered and multi instance environment.
We have a clustered environment that has 2 FRs and 6 PRs, one of the PRs has a sender/receiver channel to a multi instance environment.
The CONNAME is a DNS entry, that goes to F5 on the Multi Instance side and that handles the failover, the problem we are having is the Multi Instance has two DCs, so sequence mismatch is going to occur.
The messages are time sensitive so 40 seconds max is all we have for a specific query. We decided to create a script that targets MQSC to reset the channel sequence number and this works if we do it manually, we want to automate it, so we are discussing how we would pick up the event of a mismatch and then have a cron job running and the script to target this event/method and do the reset.
One way was to target the MQ logs, look for error codes and then run the script that way, the other we thought of targeting the MQ process for the sender channel side, we have yet to test this (from a automation perspective), but the reason I am asking this question is, are there any other ways (simpler or better) we could do this?
The envs details are as follows:
Clustered (MQ v9.1.0.5) running on AIX 7.2
Multi Instance (MQ V9.1.0.7) Running Oracle Linux 7
Thanks,
Rayn |
|
Back to top |
|
 |
fjb_saper |
Posted: Thu Jan 12, 2023 7:28 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
RHeunes wrote: |
The CONNAME is a DNS entry, that goes to F5 on the Multi Instance side and that handles the failover, the problem we are having is the Multi Instance has two DCs, so sequence mismatch is going to occur. |
You need to explain this much more in details. Why are you seeing sequence mismatch errors, and what is causing them?  _________________ MQ & Broker admin |
|
Back to top |
|
 |
PeterPotkay |
Posted: Thu Jan 12, 2023 4:19 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Rayn,
If channel sequence #s could safely be automatically reset, IBM would just make that happen in the base product. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
fjb_saper |
Posted: Thu Jan 12, 2023 9:55 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Sorry Morag, had to delete your post because I edited it instead of quoting it...
Morag wrote: |
The F5 set up is not supported and will cause sequence number discrepancies |
I would generally agree. However I believe the OP specified that the F5 served to specify a single IP for Multi-Instance queue managers.
Is that still true if the F5 only serves as a proxy for the Multi-instance qmgr?  _________________ MQ & Broker admin |
|
Back to top |
|
 |
RHeunes |
Posted: Fri Jan 13, 2023 12:25 am Post subject: Sequence Number (Cluster to Multi Instance) Scripting Idea |
|
|
Novice
Joined: 13 Sep 2021 Posts: 16
|
fjb_saper wrote: |
RHeunes wrote: |
The CONNAME is a DNS entry, that goes to F5 on the Multi Instance side and that handles the failover, the problem we are having is the Multi Instance has two DCs, so sequence mismatch is going to occur. |
You need to explain this much more in details. Why are you seeing sequence mismatch errors, and what is causing them?  |
Hi fjb_saper
The Multi Instance (MI) environment has two active/standby instances, one for a DC1 and the second for DC2, that means the data set for MQ is not shared across these DC's and this is where our mismatch issues start. Which makes sense. Each DC has two hosts, PRDESB01 and PRDESB02 in an active/standby configuration.
So I need to also inform you when I refer to "we" it is the clustered environment, the MI environment is another agency and the design and architecture is outside our control, they consume/supply info to/from our front end offices, there are specification and development objections, as we have tried to get them to either become part of our cluster (to some degree) or allow us to directly connect to the 2 DC's via their IPs in CONNAME <-which we cannot test to see if it will solve the issue.
More on the MI:
We have a QMGR that has a sender/receiver pair to the MI environment, our CONNNAME is a DNS entry that connects to their F5 (GTM/LTM), this is then configured to provide the failover to either DC, the local (PRDESB01/02) failovers are transparent in this case.
Why the sequence mismatch:
What we have found is, when there is a TCP/IP failure or a general failure on MQ we get the sequence mismatch, our sender sends a message with 413783 and their side expects a sequence number of 1 for example, this causes the sender channel (clustered env) to go into a retrying state because they do not agree on the sequence number and the logs produce a mismatch error. When we run a manual reset on the sender it allows the channel to connect and messages flow to MI, In the case below.
Example Error - Sender Side:
----- amqrccca.c : 439 --------------------------------------------------------
11/23/22 17:40:49 - Process(25231632.1) User(mqm) Program(runmqchl)
Host(xxx) Installation(Installation1)
VRMF(9.1.0.5) QMgr(QMGW)
Time(2022-11-23T15:40:49.618Z)
RemoteHost(1.1.1.1(1414))
ArithInsert1(413783) ArithInsert2(1)
CommentInsert1(QMGW.QM_MI)
CommentInsert2(1.1.1.1(1414))
CommentInsert3(xxx)
AMQ9526E: Message sequence number error for channel 'QMGW.QM_MI'.
EXPLANATION:
The local and remote queue managers do not agree on the next message sequence
number. A message with sequence number 413783 has been sent when sequence
number 1 was expected. The remote host is '1.1.1.1(1414)'.
ACTION:
Determine the cause of the inconsistency. It could be that the synchronization
information has become damaged, or has been backed out to a previous version.
If the situation cannot be resolved, the sequence number can be manually reset
at the sending end of the channel using the RESET CHANNEL command.
----- cmqxrfpt.c : 672 --------------------------------------------------------
11/23/22 17:40:49 - Process(25231632.1) User(mqm) Program(runmqchl)
Host(xxx) Installation(Installation1)
VRMF(9.1.0.5) QMgr(QMGW)
Time(2022-11-23T15:40:49.623Z)
RemoteHost(1.1.1.1(1414))
CommentInsert1(QMGW.QM_MI)
CommentInsert2(1.1.1.1(1414))
AMQ9506E: Message receipt confirmation failed.
EXPLANATION:
Channel 'QMGW.QM_MI' has ended because the remote queue manager on host
'1.1.1.1(1414)' did not accept the last batch of messages.
ACTION:
The error log for the channel at the remote site will contain an explanation of
the failure. Contact the remote Systems Administrator to resolve the problem.
----- amqrmrca.c : 884 --------------------------------------------------------
END.
Morag mentioned earlier a loss of messages, the messages are stuck on the XMITQ due to the channel being in retrying state, we can confirm messages are not lost. Also their backend uses Spectrum Scale and their DB's are setup to sync data, from that aspect their F5 does WLM to the backend system, from our end (MQ) it is purely for failover purposes. So if a message went through on DC1 it would be serviced on DC2 from their backend logic. We always have both DC's connected to our receiver end.
So our goal is to make our availability to the front end available as best and efficiently as possible, this is a concern for us and we would like to see how best we can address it. I mentioned earlier we have 40 seconds to get a request from the front end to the MI and return the response to the front end, manual intervention doesn't cater well for such a situation in case of a mismatch.
NOTE: We are considering to use a script to try and automate this, this is a test scenario at the moment, please see my first thread above on what we think we want to try. To add, we want to always reset the sequence number to 1, we will do this on both ends of the wire, to my knowledge this should be fine, if not please let me know.
I hope this is sufficient info.
Regards,
Rayn |
|
Back to top |
|
 |
fjb_saper |
Posted: Fri Jan 13, 2023 2:09 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Your problem arises because you are indiscriminately sending to 2 different qmgrs using the same channel. They seem to use the F5 not as a proxy for the same qmgr, but as a failover between 2 qmgrs.
There is one thing that could be done automatically:
- define 2 distinct channels using the same xmitq, one channel to each of the target queue managers
- when the channel is in retrying mode, switch the channel name in the xmitq trigger data field
This would allow you to keep the message flow going but have 2 separate channels i.e. 2 separate number sequence counters. Switching from one to the other would then allow to switch the sequence counter. Of course you also need to alert if when switching the channel, the channel does not start...  _________________ MQ & Broker admin |
|
Back to top |
|
 |
RHeunes |
Posted: Fri Jan 13, 2023 3:10 am Post subject: |
|
|
Novice
Joined: 13 Sep 2021 Posts: 16
|
fjb_saper wrote: |
There is one thing that could be done automatically:
|
You mentioned:
define 2 distinct channels using the same xmitq, one channel to each of the target queue managers
We do:
Channel 1, A sender named QMGW.QM_MIDC1 using same XMITQ and a DNS entry of mi-pre-prod.abc.com(1414)
Channel 2, A sender named QMGW.QM_MIDC2 using same XMITQ and a DNS entry of mi-pre-prod.abc.com(1415)
They do:
Create the corresponding receiver channels their end to match our senders
when the channel is in retrying mode, switch the channel name in the xmitq trigger data field
Can you give me a bit more context here, the trigger data is QMGW.QM_MIDC1, if MIDC1 is unavailable we need to change the value to QMGW.QM_MIDC2, This would be a manual procedure correct?
What if I were to leave TRIGDATA blank, the info center states that the channel initiator will look for a channel definition that has the named XMITQ.
https://www.ibm.com/docs/en/ibm-mq/9.1?topic=management-triggering-channels
Thanks,
Rayn
Last edited by RHeunes on Fri Jan 13, 2023 4:32 am; edited 1 time in total |
|
Back to top |
|
 |
bruce2359 |
Posted: Fri Jan 13, 2023 4:17 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
RHeunes wrote: |
What if I were to leave TRIGDATA blank, the info center states that the channel initiator will look for a channel definition that has the named XMITQ. |
To help us help you ...
Please post the URL of info center (or other website) to which you refer. Copy and paste it into your post, highlight (select) it, then click the URL button above. If you are quoting something, copy the text into your post, then highlight (select) the text, then click the Quote button above. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
RHeunes |
Posted: Fri Jan 13, 2023 4:34 am Post subject: |
|
|
Novice
Joined: 13 Sep 2021 Posts: 16
|
bruce2359 wrote: |
To help us help you ... |
Added the URL. |
|
Back to top |
|
 |
bruce2359 |
Posted: Fri Jan 13, 2023 9:07 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
From the URL provided:
Quote: |
If you do not specify a channel name, the channel initiator searches the channel definition files until it finds a channel that is associated with the named transmission queue. |
Curious (odd) wording channel definition files. Which files on the qmgr would these be? Does this mean channel object definitions? _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Sun Jan 15, 2023 9:59 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Yeah, a new MQ Cluster that contains just your sending QM and their 2 receiving QMs would be ideal to solve for this, but it looks like you can't implement that for I assume legitimate reasons that will not go away.
Flipping the SNDR channel name as needed on the XMITQ may work. But some things to consider...
The original SNDR channel will maintain an exclusive lock on the XMITQ unless/until you intervene. The queue will be get inhibited or opened exclusively depending what state the first SNDR channel is in. It was given a job to do and can't have anything else opening the XMITQ.
You will eventually run into not just a sequence # issue but also a SNDR channel that is in doubt. How will you automatically handle that? Commit or rollback incorrectly and you will lose or duplicate messages, respectively.
So you will have to hard stop SNDR channel #1, Get Enable the XMITQ, mop up any In Doubt situation, reset the sequence #, change the XMITQ to SNDR channel #2. Are you going to have any information from the receiving side while all this is occurring? If not, you will be taking all these actions assuming a lot. If it was a solid plan I'll iterate what I wrote previously: IBM would have just put this into the base product as default behavior. But to assure once and only once delivery IBM says the MQ Admin must be involved in these steps after first understanding (not assuming) what is happening on both ends of the channel.
Don't forget to plan for fail back. Maybe the receiving end is going to flip flop back to the first QM a few seconds after your automation plan got things moving to their second QM. So this automation is always going to have consider both channels and not assume the direction of the fail over. If you automate fail back you introduce the potential for an endless ping-pong scenario.
There is no easy answer here given the constraints you have listed. If you are willing to accept duplicate or missing messages, if the bosses are as well(get it in writing), then this plan can work. Do have copious logging and push alerts so you know if/when/what your work around is doing. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
RHeunes |
Posted: Mon Jan 16, 2023 12:29 am Post subject: |
|
|
Novice
Joined: 13 Sep 2021 Posts: 16
|
bruce2359 wrote: |
From the URL provided:
Quote: |
If you do not specify a channel name, the channel initiator searches the channel definition files until it finds a channel that is associated with the named transmission queue. |
Curious (odd) wording channel definition files. Which files on the qmgr would these be? Does this mean channel object definitions? |
My thinking is the definition files under /var/mqm/qmgrs/QM01/channel you can cat them and you can see the CONNAME value and it has the channel names so I am assuming this is where it looks. |
|
Back to top |
|
 |
RHeunes |
Posted: Mon Jan 16, 2023 12:31 am Post subject: |
|
|
Novice
Joined: 13 Sep 2021 Posts: 16
|
PeterPotkay wrote: |
Yeah, a new MQ Cluster that contains just your sending QM and their 2 receiving QMs would be ideal to solve for this, but it looks like you can't implement that for I assume legitimate reasons that will not go away.
Flipping the SNDR channel name as needed on the XMITQ may work. But some things to consider...
The original SNDR channel will maintain an exclusive lock on the XMITQ unless/until you intervene. The queue will be get inhibited or opened exclusively depending what state the first SNDR channel is in. It was given a job to do and can't have anything else opening the XMITQ.
You will eventually run into not just a sequence # issue but also a SNDR channel that is in doubt. How will you automatically handle that? Commit or rollback incorrectly and you will lose or duplicate messages, respectively.
So you will have to hard stop SNDR channel #1, Get Enable the XMITQ, mop up any In Doubt situation, reset the sequence #, change the XMITQ to SNDR channel #2. Are you going to have any information from the receiving side while all this is occurring? If not, you will be taking all these actions assuming a lot. If it was a solid plan I'll iterate what I wrote previously: IBM would have just put this into the base product as default behavior. But to assure once and only once delivery IBM says the MQ Admin must be involved in these steps after first understanding (not assuming) what is happening on both ends of the channel.
Don't forget to plan for fail back. Maybe the receiving end is going to flip flop back to the first QM a few seconds after your automation plan got things moving to their second QM. So this automation is always going to have consider both channels and not assume the direction of the fail over. If you automate fail back you introduce the potential for an endless ping-pong scenario.
There is no easy answer here given the constraints you have listed. If you are willing to accept duplicate or missing messages, if the bosses are as well(get it in writing), then this plan can work. Do have copious logging and push alerts so you know if/when/what your work around is doing. |
Thanks PeterPotkay,
I will give the options I have a go and report back here if it was successful or not, thanks for the feedback. |
|
Back to top |
|
 |
RHeunes |
Posted: Fri Feb 17, 2023 12:25 am Post subject: Sequence Number (Cluster to Multi Instance) Scripting Idea |
|
|
Novice
Joined: 13 Sep 2021 Posts: 16
|
Good day,
So we have setup the scripting and tested it and it works as intended, however the MI ESB team now want to explore using the IgnoreSeqNumberMismatch attribute in qm.ini
I have never had to use this property, and I understand it is not used under normal circumstances. Personally I don't think it is a good idea??? I could be wrong.
Does anyone have experience with this property?
Regards,
Rayn |
|
Back to top |
|
 |
fjb_saper |
Posted: Sat Feb 18, 2023 12:29 am Post subject: Re: Sequence Number (Cluster to Multi Instance) Scripting Id |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
RHeunes wrote: |
Good day,
So we have setup the scripting and tested it and it works as intended, however the MI ESB team now want to explore using the IgnoreSeqNumberMismatch attribute in qm.ini
I have never had to use this property, and I understand it is not used under normal circumstances. Personally I don't think it is a good idea??? I could be wrong.
Does anyone have experience with this property?
Regards,
Rayn |
I would only use this attribute if I knew the other side was running in containers and every time the container is being dehydrated / re-hydrated you're dealing with a new queue manager. This would only be feasible if the messages received by the other side are non persistent and can be safely duplicated... (if in doubt rollback)  _________________ MQ & Broker admin |
|
Back to top |
|
 |
|
|
 |
|
Page 1 of 1 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|