ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » IBM MQ Installation/Configuration Support » DR host failover and channel sequence numbers

Post new topic  Reply to topic Goto page Previous  1, 2
 DR host failover and channel sequence numbers « View previous topic :: View next topic » 
Author Message
PeterPotkay
PostPosted: Thu Feb 06, 2014 5:11 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

mqjeff wrote:
I would solve this problem with a script that runs as part of qmgr startup on the DR systems. The only time you need to reset these channels is if the queue manager is starting up on the DR. It doesn't fundamentally *hurt* things, except for slowing down the startup time, if you reset the sender channels *every* time the DR qmgr starts up.


He has SENDER channel's needing sequence # resets on the the client systems that are not in the DR site.

However, a RCVR channel can reset its sequence to match what the SNDR is expecting. So you could make the reset of the sequence # occur on the DR side as the DR QMs come up, looking to see what the client QMs are sending. That way the client systems don't need any change. But you can't blindly reset to 1, you have to look at each RCVR channel's error messages to see the specific # expected by the partner SNDR.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Thu Feb 06, 2014 7:10 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

PeterPotkay wrote:
mqjeff wrote:
I would solve this problem with a script that runs as part of qmgr startup on the DR systems. The only time you need to reset these channels is if the queue manager is starting up on the DR. It doesn't fundamentally *hurt* things, except for slowing down the startup time, if you reset the sender channels *every* time the DR qmgr starts up.


He has SENDER channel's needing sequence # resets on the the client systems that are not in the DR site.

However, a RCVR channel can reset its sequence to match what the SNDR is expecting. So you could make the reset of the sequence # occur on the DR side as the DR QMs come up, looking to see what the client QMs are sending. That way the client systems don't need any change. But you can't blindly reset to 1, you have to look at each RCVR channel's error messages to see the specific # expected by the partner SNDR.


Alternatively check the saved status of the stopped receiver channels before shutting down prod?
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
PeterPotkay
PostPosted: Thu Feb 06, 2014 9:11 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

fjb_saper wrote:
PeterPotkay wrote:
mqjeff wrote:
I would solve this problem with a script that runs as part of qmgr startup on the DR systems. The only time you need to reset these channels is if the queue manager is starting up on the DR. It doesn't fundamentally *hurt* things, except for slowing down the startup time, if you reset the sender channels *every* time the DR qmgr starts up.


He has SENDER channel's needing sequence # resets on the the client systems that are not in the DR site.

However, a RCVR channel can reset its sequence to match what the SNDR is expecting. So you could make the reset of the sequence # occur on the DR side as the DR QMs come up, looking to see what the client QMs are sending. That way the client systems don't need any change. But you can't blindly reset to 1, you have to look at each RCVR channel's error messages to see the specific # expected by the partner SNDR.


Alternatively check the saved status of the stopped receiver channels before shutting down prod?


You can predict when disaster is about to strike?

Yeah you can do it for DR tests that you plan for, but anytime you do something in a DR test that you wouldn't do in a real DR.....
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
smdavies99
PostPosted: Thu Feb 06, 2014 10:54 am    Post subject: Reply with quote

Jedi Council

Joined: 10 Feb 2003
Posts: 6076
Location: Somewhere over the Rainbow this side of Never-never land.

Many thanks for all the discussion.
The 'D' in DR stands for disaster. The location of these systems are in a place where major power outages are not uncommon. Despite UPS and Generators the Main site could go down and .... well we have to make sure that the DR site is online and working withing 15 minutes.

IMHO, the only practical solution is to use some remote shell scripts to reset ALL the channel sequence numbers. It should only take a minute provided the network is up and running. That is the biggest risk.

Thanks again.
_________________
WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995

Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions.
Back to top
View user's profile Send private message
bgrieb
PostPosted: Wed Jun 11, 2014 6:25 am    Post subject: Reply with quote

Newbie

Joined: 11 Jun 2014
Posts: 2

I have a slight variation from the original post:

Quote:
"Suppose I have TWO DataCentres A and B. There are a number of Client systems all running WMQ Servers with SDR/RCVR channels to the main DC. Everything is hunky dory and is working away until someone pulls the plug on the DC and it fails over to the DR site. A reconfig of the network re-routes the channel connections from the remote site to the DR host. "


We have a similar setup here but with a single source and two identically configured target machines in separate facilities. Data always streams to the primary site unless there's a system outage which forces traffic to the secondary site. In a recent DR test it appears that the sequences were being reset automatically. Since we do not control the Sender side of the channel I could not be certain that the source party was not resetting it, though they indicated they had not. We have a lot of back end processes that need to be updated during a failover so this sequence behavior is not what I expected or desired. As a result, I attempted to create a sequence mismatch in a test environment and found that I was unable to do so.

Here's how I set up my environments:
* Created multiple installations of Websphere 7.0.1.8 on Win2K8 machines
* created a source Queue Manager with a single Sender channel on one installation
* created two identical Queue Managers with a Receiver on two separate installations.
* created a single DNS reference for the Receiver nodes

To simulate the failover:
* I connected the Sender to the Receivers on the first node, began loading a high volume of messages (that would continue through out the exercise) and confirmed that messages were making it to the first node.
* I changed the DNS reference for the Receiver to point at the second node
* I interrupted the stream of data by making the first receiver node inaccessible.
* The Sender entered a retry state before reestablishing the connection, this time to the second node. Messages began flowing to the second node without resetting the sequences.
* I reversed the process the same way and began sending data back to the first node without resetting the sequences.

I don't understand how this test did not generate two sequence mismatches so I'm obviously missing something. I'd appreciate any input. Thank you.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Wed Jun 11, 2014 8:00 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

Were any of your messages persistent?
What are you using in as receiving qmgrs? The active instance of a multi-instance qmgr, or of a clustered qmgr?
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
bgrieb
PostPosted: Wed Jun 11, 2014 1:26 pm    Post subject: Reply with quote

Newbie

Joined: 11 Jun 2014
Posts: 2

Quote:
Were any of your messages persistent?
What are you using in as receiving qmgrs? The active instance of a multi-instance qmgr, or of a clustered qmgr?


I've made no changes from the defaults in regard to persistence when creating the objects. The messages are being put into the queue with a simple Powershell script wrapped in a DO Loop.

Code:
$testmessage = New-WMQMessage
$testmessage.CharacterSet = 1208
$testmessage.Format = [IBM.WMQ.MQC]::MQFMT_STRING
$testmessage.WriteString("Test Message #$messageValue")
Send-WMQMessage $testmessage (Get-WMQQueue $queue -QmgrName $qmgr)


As for the receiving queue managers, they are completely independant and not part of a cluster. They were created on seperate machines in seperate facilities using the exact same name for all the objects. We send to the primary node 99.99% of the time, but in the event of an outage we redirect traffic to the secondary (or standby) instance.
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Wed Jun 11, 2014 2:14 pm    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

Repeat the test, but this time pay attention to the sequence #s. Determine if messages are flowing with mismatched sequence #s, or if somehow the sequence#s are being reset when the failover occurs so the do match and then the messages flow.

In either case, if this happens without human intervention, and everything is as you described, I would consider this unexpected behavior and worthy of a PMR to find out what the heck is going on.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Thu Jun 12, 2014 5:16 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

PeterPotkay wrote:
Repeat the test, but this time pay attention to the sequence #s. Determine if messages are flowing with mismatched sequence #s, or if somehow the sequence#s are being reset when the failover occurs so the do match and then the messages flow.

In either case, if this happens without human intervention, and everything is as you described, I would consider this unexpected behavior.


Like a monitoring app resetting the sequence number on it's own (according to a rule) in case of a mismatch > (n * batch size) ?
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
gbaddeley
PostPosted: Thu Jun 12, 2014 8:20 pm    Post subject: Reply with quote

Jedi Knight

Joined: 25 Mar 2003
Posts: 2538
Location: Melbourne, Australia

Channel sequence errors can all be handled on the DR qmgr.

Sender channels: Reset to 1, the rcvr on the remote qmgr will comply.

Receiver channels: Reset to the number expected by the sender on the remote qmgr. This can be found in the qmgr's error log.

If the channels are doing "long" retry (highly likely) it may take a while for them to go into normal running status (eg. default is 20 minutes, but check settings on your channels). Stopping and Starting the sender channels will avoid a potentially long wait.

The same principles apply to the other types of distributed channels.
_________________
Glenn
Back to top
View user's profile Send private message
pmeekin
PostPosted: Thu Jun 19, 2014 2:01 pm    Post subject: Reply with quote

Novice

Joined: 13 Jan 2003
Posts: 10
Location: UK

bgrieb wrote:
I don't understand how this test did not generate two sequence mismatches so I'm obviously missing something. I'd appreciate any input. Thank you.


If you don't send persistent messages the channels' saved statuses never get updated. Do dis chs(*) saved all and see if you get anything.

If not then when the channels restart they will always start at 1 again.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Goto page Previous  1, 2 Page 2 of 2

MQSeries.net Forum Index » IBM MQ Installation/Configuration Support » DR host failover and channel sequence numbers
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.