Author |
Message
|
PeterPotkay |
Posted: Thu Feb 06, 2014 5:11 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
mqjeff wrote: |
I would solve this problem with a script that runs as part of qmgr startup on the DR systems. The only time you need to reset these channels is if the queue manager is starting up on the DR. It doesn't fundamentally *hurt* things, except for slowing down the startup time, if you reset the sender channels *every* time the DR qmgr starts up. |
He has SENDER channel's needing sequence # resets on the the client systems that are not in the DR site.
However, a RCVR channel can reset its sequence to match what the SNDR is expecting. So you could make the reset of the sequence # occur on the DR side as the DR QMs come up, looking to see what the client QMs are sending. That way the client systems don't need any change. But you can't blindly reset to 1, you have to look at each RCVR channel's error messages to see the specific # expected by the partner SNDR. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
fjb_saper |
Posted: Thu Feb 06, 2014 7:10 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
PeterPotkay wrote: |
mqjeff wrote: |
I would solve this problem with a script that runs as part of qmgr startup on the DR systems. The only time you need to reset these channels is if the queue manager is starting up on the DR. It doesn't fundamentally *hurt* things, except for slowing down the startup time, if you reset the sender channels *every* time the DR qmgr starts up. |
He has SENDER channel's needing sequence # resets on the the client systems that are not in the DR site.
However, a RCVR channel can reset its sequence to match what the SNDR is expecting. So you could make the reset of the sequence # occur on the DR side as the DR QMs come up, looking to see what the client QMs are sending. That way the client systems don't need any change. But you can't blindly reset to 1, you have to look at each RCVR channel's error messages to see the specific # expected by the partner SNDR. |
Alternatively check the saved status of the stopped receiver channels before shutting down prod?  _________________ MQ & Broker admin |
|
Back to top |
|
 |
PeterPotkay |
Posted: Thu Feb 06, 2014 9:11 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
fjb_saper wrote: |
PeterPotkay wrote: |
mqjeff wrote: |
I would solve this problem with a script that runs as part of qmgr startup on the DR systems. The only time you need to reset these channels is if the queue manager is starting up on the DR. It doesn't fundamentally *hurt* things, except for slowing down the startup time, if you reset the sender channels *every* time the DR qmgr starts up. |
He has SENDER channel's needing sequence # resets on the the client systems that are not in the DR site.
However, a RCVR channel can reset its sequence to match what the SNDR is expecting. So you could make the reset of the sequence # occur on the DR side as the DR QMs come up, looking to see what the client QMs are sending. That way the client systems don't need any change. But you can't blindly reset to 1, you have to look at each RCVR channel's error messages to see the specific # expected by the partner SNDR. |
Alternatively check the saved status of the stopped receiver channels before shutting down prod?  |
You can predict when disaster is about to strike?
Yeah you can do it for DR tests that you plan for, but anytime you do something in a DR test that you wouldn't do in a real DR..... _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
smdavies99 |
Posted: Thu Feb 06, 2014 10:54 am Post subject: |
|
|
 Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
Many thanks for all the discussion.
The 'D' in DR stands for disaster. The location of these systems are in a place where major power outages are not uncommon. Despite UPS and Generators the Main site could go down and .... well we have to make sure that the DR site is online and working withing 15 minutes.
IMHO, the only practical solution is to use some remote shell scripts to reset ALL the channel sequence numbers. It should only take a minute provided the network is up and running. That is the biggest risk.
Thanks again. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
 |
bgrieb |
Posted: Wed Jun 11, 2014 6:25 am Post subject: |
|
|
Newbie
Joined: 11 Jun 2014 Posts: 2
|
I have a slight variation from the original post:
Quote: |
"Suppose I have TWO DataCentres A and B. There are a number of Client systems all running WMQ Servers with SDR/RCVR channels to the main DC. Everything is hunky dory and is working away until someone pulls the plug on the DC and it fails over to the DR site. A reconfig of the network re-routes the channel connections from the remote site to the DR host. " |
We have a similar setup here but with a single source and two identically configured target machines in separate facilities. Data always streams to the primary site unless there's a system outage which forces traffic to the secondary site. In a recent DR test it appears that the sequences were being reset automatically. Since we do not control the Sender side of the channel I could not be certain that the source party was not resetting it, though they indicated they had not. We have a lot of back end processes that need to be updated during a failover so this sequence behavior is not what I expected or desired. As a result, I attempted to create a sequence mismatch in a test environment and found that I was unable to do so.
Here's how I set up my environments:
* Created multiple installations of Websphere 7.0.1.8 on Win2K8 machines
* created a source Queue Manager with a single Sender channel on one installation
* created two identical Queue Managers with a Receiver on two separate installations.
* created a single DNS reference for the Receiver nodes
To simulate the failover:
* I connected the Sender to the Receivers on the first node, began loading a high volume of messages (that would continue through out the exercise) and confirmed that messages were making it to the first node.
* I changed the DNS reference for the Receiver to point at the second node
* I interrupted the stream of data by making the first receiver node inaccessible.
* The Sender entered a retry state before reestablishing the connection, this time to the second node. Messages began flowing to the second node without resetting the sequences.
* I reversed the process the same way and began sending data back to the first node without resetting the sequences.
I don't understand how this test did not generate two sequence mismatches so I'm obviously missing something. I'd appreciate any input. Thank you. |
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Jun 11, 2014 8:00 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Were any of your messages persistent?
What are you using in as receiving qmgrs? The active instance of a multi-instance qmgr, or of a clustered qmgr?  _________________ MQ & Broker admin |
|
Back to top |
|
 |
bgrieb |
Posted: Wed Jun 11, 2014 1:26 pm Post subject: |
|
|
Newbie
Joined: 11 Jun 2014 Posts: 2
|
Quote: |
Were any of your messages persistent?
What are you using in as receiving qmgrs? The active instance of a multi-instance qmgr, or of a clustered qmgr? |
I've made no changes from the defaults in regard to persistence when creating the objects. The messages are being put into the queue with a simple Powershell script wrapped in a DO Loop.
Code: |
$testmessage = New-WMQMessage
$testmessage.CharacterSet = 1208
$testmessage.Format = [IBM.WMQ.MQC]::MQFMT_STRING
$testmessage.WriteString("Test Message #$messageValue")
Send-WMQMessage $testmessage (Get-WMQQueue $queue -QmgrName $qmgr) |
As for the receiving queue managers, they are completely independant and not part of a cluster. They were created on seperate machines in seperate facilities using the exact same name for all the objects. We send to the primary node 99.99% of the time, but in the event of an outage we redirect traffic to the secondary (or standby) instance. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Wed Jun 11, 2014 2:14 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Repeat the test, but this time pay attention to the sequence #s. Determine if messages are flowing with mismatched sequence #s, or if somehow the sequence#s are being reset when the failover occurs so the do match and then the messages flow.
In either case, if this happens without human intervention, and everything is as you described, I would consider this unexpected behavior and worthy of a PMR to find out what the heck is going on. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
fjb_saper |
Posted: Thu Jun 12, 2014 5:16 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
PeterPotkay wrote: |
Repeat the test, but this time pay attention to the sequence #s. Determine if messages are flowing with mismatched sequence #s, or if somehow the sequence#s are being reset when the failover occurs so the do match and then the messages flow.
In either case, if this happens without human intervention, and everything is as you described, I would consider this unexpected behavior. |
Like a monitoring app resetting the sequence number on it's own (according to a rule) in case of a mismatch > (n * batch size) ?  _________________ MQ & Broker admin |
|
Back to top |
|
 |
gbaddeley |
Posted: Thu Jun 12, 2014 8:20 pm Post subject: |
|
|
 Jedi Knight
Joined: 25 Mar 2003 Posts: 2538 Location: Melbourne, Australia
|
Channel sequence errors can all be handled on the DR qmgr.
Sender channels: Reset to 1, the rcvr on the remote qmgr will comply.
Receiver channels: Reset to the number expected by the sender on the remote qmgr. This can be found in the qmgr's error log.
If the channels are doing "long" retry (highly likely) it may take a while for them to go into normal running status (eg. default is 20 minutes, but check settings on your channels). Stopping and Starting the sender channels will avoid a potentially long wait.
The same principles apply to the other types of distributed channels. _________________ Glenn |
|
Back to top |
|
 |
pmeekin |
Posted: Thu Jun 19, 2014 2:01 pm Post subject: |
|
|
Novice
Joined: 13 Jan 2003 Posts: 10 Location: UK
|
bgrieb wrote: |
I don't understand how this test did not generate two sequence mismatches so I'm obviously missing something. I'd appreciate any input. Thank you. |
If you don't send persistent messages the channels' saved statuses never get updated. Do dis chs(*) saved all and see if you get anything.
If not then when the channels restart they will always start at 1 again. |
|
Back to top |
|
 |
|