Author |
Message
|
jordanfang |
Posted: Tue Nov 23, 2010 7:21 pm Post subject: Issue with message sequence number after HA switched |
|
|
Newbie
Joined: 11 Jun 2010 Posts: 7
|
Hi, all
We are using MQ 5.3 on HP-UX.
There is a qmgr with hardwre failover, which communicates with an external qmgr. After HA switched, our reciever channel ended abnormally, error message is following:
AMQ9526: Message sequence number error for channel 'XXX'.
EXPLANATION:
The local and remote queue managers do not agree on the next message sequence number. A message with sequence number 1273498 has been sent when sequence number 1 was expected.
It seems that the external qmgr's sender channel didn't start a new instance. I want to know the cause for this issue. Anyone who had seen this issue before or if you have any advice on how this can be resolved , please advice. |
|
Back to top |
|
 |
manoj798 |
Posted: Tue Nov 23, 2010 8:17 pm Post subject: |
|
|
Apprentice
Joined: 17 Nov 2009 Posts: 30
|
find out the seqno for both the sender channel and the rcvr channel. you can see a difference in seqno. reset both the sender channel and the rcvr channel to 1 and star the channel. |
|
Back to top |
|
 |
Mr Butcher |
Posted: Tue Nov 23, 2010 11:28 pm Post subject: |
|
|
 Padawan
Joined: 23 May 2005 Posts: 1716
|
Quote: |
There is a qmgr with hardwre failover |
how is your failover? is the queuemanager the same on the "other" hardware (do you also switch the disks with holds the queuemanager), or is it a second queuemanager installation on the "other" hardware (looks like) ?!?
in that second case, after Hw failover, you have to reset channels in every case as there is a new receiver at your end with a different sequence number.
only if the failover also includes the queuemanager the sequence number should be fine.
its also sufficient to only reset the sender, or reset the receiver to what is expected from the sender if you can not get your hands at the sender. _________________ Regards, Butcher |
|
Back to top |
|
 |
jordanfang |
Posted: Wed Nov 24, 2010 12:01 am Post subject: |
|
|
Newbie
Joined: 11 Jun 2010 Posts: 7
|
Thanks for your reply.
Yes, qmgr is the same on the second server, namely failover switch the disks with hold the qmgr data.
I know to resolve seqno mismatch with reset channel. I want to find out the cause for seqno mismatch and how to avoid this issue |
|
Back to top |
|
 |
Mr Butcher |
Posted: Wed Nov 24, 2010 1:54 am Post subject: |
|
|
 Padawan
Joined: 23 May 2005 Posts: 1716
|
if the queuemanager is the same then you should not be out of sequence after the failover ... except there is message loss somewhere.
But from the message you posted
Quote: |
A message with sequence number 1273498 has been sent when sequence number 1 was expected. |
i assume somebody (or failover scripts) reset the receiver to 1 during the failover. _________________ Regards, Butcher |
|
Back to top |
|
 |
jordanfang |
Posted: Wed Nov 24, 2010 5:25 pm Post subject: |
|
|
Newbie
Joined: 11 Jun 2010 Posts: 7
|
Nobody reset the receiver channel, and HA switch scripts is following:
endmqlsr -m qmgr-name
endmqm -w qmgr-name
When the disk switch to another node, run the scripts:
strmqm qmgr-name
runmqlsr -m qmgr-name -t TCP
runmqsc qmgr-name <<EOF
start channel(sender-channel)
start channel(receiver-channel)
end
I am surprised at this. |
|
Back to top |
|
 |
exerk |
Posted: Thu Nov 25, 2010 12:05 am Post subject: |
|
|
 Jedi Council
Joined: 02 Nov 2006 Posts: 6339
|
jordanfang wrote: |
Nobody reset the receiver channel... |
Really? Has the other end of your SDR also reported out-of-sequence errors? Does this...
jordanfang wrote: |
.
runmqsc qmgr-name <<EOF
start channel(sender-channel)
start channel(receiver-channel)
. |
...only run in those lines? Is there something else, another script somehwere, that may redefine the objects on start?
Does this happen each time a fail-over occurs?
Oh, and the usual advice about getting off V5.3 etc. _________________ It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys. |
|
Back to top |
|
 |
nheng |
Posted: Thu Nov 25, 2010 2:26 am Post subject: Try it. |
|
|
 Apprentice
Joined: 07 Dec 2007 Posts: 39
|
reset chl(SDR) seqnum( NUMBERA)
reset chl(RCV) seqnum( NUMBERA)
Try it. |
|
Back to top |
|
 |
Mr Butcher |
Posted: Thu Nov 25, 2010 4:28 am Post subject: |
|
|
 Padawan
Joined: 23 May 2005 Posts: 1716
|
The problem we discuss here is not HOW to reset channels, but WHY they have been out of sync after a HA failover. _________________ Regards, Butcher |
|
Back to top |
|
 |
SAFraser |
Posted: Thu Nov 25, 2010 6:48 am Post subject: |
|
|
 Shaman
Joined: 22 Oct 2003 Posts: 742 Location: Austin, Texas, USA
|
I'm curious why the OP would start the receiver channel on his failed-over queue manager. We only start receiver channels when we are forced to do so as a step in troubleshooting. Presumably, the remote sender would be in retrying status and the receiver would start as soon as the new instance of the listener is up.
Also, I am curious why the sender would need a restart. Perhaps the OP does not trigger his transmit queue?
Of course, MQ 5.3 was not quite as robust in recovering channels.... but still, even with 5.3, we seldom issued start channel commands under most circumstances. In any event, the OP is absolutely correct in trying to determine root cause. |
|
Back to top |
|
 |
jordanfang |
Posted: Thu Nov 25, 2010 6:20 pm Post subject: |
|
|
Newbie
Joined: 11 Jun 2010 Posts: 7
|
SDR channel isn't out of sync. When failover from A node to B node, it's fine. But when failover from B node to A node, only RCV channel does report out-of-sequence error. And failover scripts is the same between two nodes.
After HA failover, the RCV channel would set up a new instance which seq num is 1, I'm curious why the peer SDR channel run an old instance which seq num is 1273498, but didn't set up a new instance. |
|
Back to top |
|
 |
zonko |
Posted: Thu Nov 25, 2010 11:07 pm Post subject: |
|
|
Voyager
Joined: 04 Nov 2009 Posts: 78
|
Quote: |
After HA failover, the RCV channel would set up a new instance which seq num is 1, I'm curious why the peer SDR channel run an old instance which seq num is 1273498, but didn't set up a new instance. |
There is your answer.
When you create a new RCVR channel, the sequence number is set to 1.
Why would you expect a qmgr hosting a SDR channel to recreate the channel?
How would it know that the qmgr hosting the RCVR has done so?
Conclusion: user error. |
|
Back to top |
|
 |
Mr Butcher |
Posted: Thu Nov 25, 2010 11:11 pm Post subject: |
|
|
 Padawan
Joined: 23 May 2005 Posts: 1716
|
Quote: |
After HA failover, the RCV channel would set up a new instance which seq num is 1 |
you don't mean delete / define, don't you? you just mean stop / start ?!? or what do you mean by "new instance" ?!?
the sequence number is "persistent". if a channel is stopped and started again, the sequence number does not start from 1 but from what it was before. _________________ Regards, Butcher |
|
Back to top |
|
 |
jordanfang |
Posted: Fri Nov 26, 2010 12:52 am Post subject: |
|
|
Newbie
Joined: 11 Jun 2010 Posts: 7
|
Yes, I don't mean delete and redefine a new recv channel object, I mean stop and restart recv channel.
The sequence number is "persistent", is it?
Why did the recv channel lost its before seq num and start from 1 if qmgr is stopped and restart |
|
Back to top |
|
 |
fjb_saper |
Posted: Fri Nov 26, 2010 2:15 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
jordanfang wrote: |
Yes, I don't mean delete and redefine a new recv channel object, I mean stop and restart recv channel.
The sequence number is "persistent", is it?
Why did the recv channel lost its before seq num and start from 1 if qmgr is stopped and restart |
Did somebody tamper with the SYSTEM.CHANNEL.SYNC.QUEUE?  _________________ MQ & Broker admin |
|
Back to top |
|
 |
|