ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » Issue with message sequence number after HA switched

Post new topic  Reply to topic Goto page 1, 2  Next
 Issue with message sequence number after HA switched « View previous topic :: View next topic » 
Author Message
jordanfang
PostPosted: Tue Nov 23, 2010 7:21 pm    Post subject: Issue with message sequence number after HA switched Reply with quote

Newbie

Joined: 11 Jun 2010
Posts: 7

Hi, all

We are using MQ 5.3 on HP-UX.
There is a qmgr with hardwre failover, which communicates with an external qmgr. After HA switched, our reciever channel ended abnormally, error message is following:
AMQ9526: Message sequence number error for channel 'XXX'.

EXPLANATION:
The local and remote queue managers do not agree on the next message sequence number. A message with sequence number 1273498 has been sent when sequence number 1 was expected.

It seems that the external qmgr's sender channel didn't start a new instance. I want to know the cause for this issue. Anyone who had seen this issue before or if you have any advice on how this can be resolved , please advice.
Back to top
View user's profile Send private message
manoj798
PostPosted: Tue Nov 23, 2010 8:17 pm    Post subject: Reply with quote

Apprentice

Joined: 17 Nov 2009
Posts: 30

find out the seqno for both the sender channel and the rcvr channel. you can see a difference in seqno. reset both the sender channel and the rcvr channel to 1 and star the channel.
Back to top
View user's profile Send private message
Mr Butcher
PostPosted: Tue Nov 23, 2010 11:28 pm    Post subject: Reply with quote

Padawan

Joined: 23 May 2005
Posts: 1716

Quote:
There is a qmgr with hardwre failover


how is your failover? is the queuemanager the same on the "other" hardware (do you also switch the disks with holds the queuemanager), or is it a second queuemanager installation on the "other" hardware (looks like) ?!?

in that second case, after Hw failover, you have to reset channels in every case as there is a new receiver at your end with a different sequence number.
only if the failover also includes the queuemanager the sequence number should be fine.

its also sufficient to only reset the sender, or reset the receiver to what is expected from the sender if you can not get your hands at the sender.
_________________
Regards, Butcher
Back to top
View user's profile Send private message
jordanfang
PostPosted: Wed Nov 24, 2010 12:01 am    Post subject: Reply with quote

Newbie

Joined: 11 Jun 2010
Posts: 7

Thanks for your reply.
Yes, qmgr is the same on the second server, namely failover switch the disks with hold the qmgr data.
I know to resolve seqno mismatch with reset channel. I want to find out the cause for seqno mismatch and how to avoid this issue
Back to top
View user's profile Send private message
Mr Butcher
PostPosted: Wed Nov 24, 2010 1:54 am    Post subject: Reply with quote

Padawan

Joined: 23 May 2005
Posts: 1716

if the queuemanager is the same then you should not be out of sequence after the failover ... except there is message loss somewhere.

But from the message you posted

Quote:
A message with sequence number 1273498 has been sent when sequence number 1 was expected.


i assume somebody (or failover scripts) reset the receiver to 1 during the failover.
_________________
Regards, Butcher
Back to top
View user's profile Send private message
jordanfang
PostPosted: Wed Nov 24, 2010 5:25 pm    Post subject: Reply with quote

Newbie

Joined: 11 Jun 2010
Posts: 7

Nobody reset the receiver channel, and HA switch scripts is following:
endmqlsr -m qmgr-name
endmqm -w qmgr-name

When the disk switch to another node, run the scripts:
strmqm qmgr-name
runmqlsr -m qmgr-name -t TCP
runmqsc qmgr-name <<EOF
start channel(sender-channel)
start channel(receiver-channel)
end

I am surprised at this.
Back to top
View user's profile Send private message
exerk
PostPosted: Thu Nov 25, 2010 12:05 am    Post subject: Reply with quote

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

jordanfang wrote:
Nobody reset the receiver channel...


Really? Has the other end of your SDR also reported out-of-sequence errors? Does this...

jordanfang wrote:
.
runmqsc qmgr-name <<EOF
start channel(sender-channel)
start channel(receiver-channel)
.


...only run in those lines? Is there something else, another script somehwere, that may redefine the objects on start?

Does this happen each time a fail-over occurs?

Oh, and the usual advice about getting off V5.3 etc.
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.
Back to top
View user's profile Send private message
nheng
PostPosted: Thu Nov 25, 2010 2:26 am    Post subject: Try it. Reply with quote

Apprentice

Joined: 07 Dec 2007
Posts: 39

reset chl(SDR) seqnum( NUMBERA)
reset chl(RCV) seqnum( NUMBERA)

Try it.
Back to top
View user's profile Send private message MSN Messenger
Mr Butcher
PostPosted: Thu Nov 25, 2010 4:28 am    Post subject: Reply with quote

Padawan

Joined: 23 May 2005
Posts: 1716

The problem we discuss here is not HOW to reset channels, but WHY they have been out of sync after a HA failover.
_________________
Regards, Butcher
Back to top
View user's profile Send private message
SAFraser
PostPosted: Thu Nov 25, 2010 6:48 am    Post subject: Reply with quote

Shaman

Joined: 22 Oct 2003
Posts: 742
Location: Austin, Texas, USA

I'm curious why the OP would start the receiver channel on his failed-over queue manager. We only start receiver channels when we are forced to do so as a step in troubleshooting. Presumably, the remote sender would be in retrying status and the receiver would start as soon as the new instance of the listener is up.

Also, I am curious why the sender would need a restart. Perhaps the OP does not trigger his transmit queue?

Of course, MQ 5.3 was not quite as robust in recovering channels.... but still, even with 5.3, we seldom issued start channel commands under most circumstances. In any event, the OP is absolutely correct in trying to determine root cause.
Back to top
View user's profile Send private message
jordanfang
PostPosted: Thu Nov 25, 2010 6:20 pm    Post subject: Reply with quote

Newbie

Joined: 11 Jun 2010
Posts: 7

SDR channel isn't out of sync. When failover from A node to B node, it's fine. But when failover from B node to A node, only RCV channel does report out-of-sequence error. And failover scripts is the same between two nodes.

After HA failover, the RCV channel would set up a new instance which seq num is 1, I'm curious why the peer SDR channel run an old instance which seq num is 1273498, but didn't set up a new instance.
Back to top
View user's profile Send private message
zonko
PostPosted: Thu Nov 25, 2010 11:07 pm    Post subject: Reply with quote

Voyager

Joined: 04 Nov 2009
Posts: 78

Quote:
After HA failover, the RCV channel would set up a new instance which seq num is 1, I'm curious why the peer SDR channel run an old instance which seq num is 1273498, but didn't set up a new instance.


There is your answer.

When you create a new RCVR channel, the sequence number is set to 1.
Why would you expect a qmgr hosting a SDR channel to recreate the channel?
How would it know that the qmgr hosting the RCVR has done so?

Conclusion: user error.
Back to top
View user's profile Send private message
Mr Butcher
PostPosted: Thu Nov 25, 2010 11:11 pm    Post subject: Reply with quote

Padawan

Joined: 23 May 2005
Posts: 1716

Quote:
After HA failover, the RCV channel would set up a new instance which seq num is 1


you don't mean delete / define, don't you? you just mean stop / start ?!? or what do you mean by "new instance" ?!?

the sequence number is "persistent". if a channel is stopped and started again, the sequence number does not start from 1 but from what it was before.
_________________
Regards, Butcher
Back to top
View user's profile Send private message
jordanfang
PostPosted: Fri Nov 26, 2010 12:52 am    Post subject: Reply with quote

Newbie

Joined: 11 Jun 2010
Posts: 7

Yes, I don't mean delete and redefine a new recv channel object, I mean stop and restart recv channel.

The sequence number is "persistent", is it?
Why did the recv channel lost its before seq num and start from 1 if qmgr is stopped and restart
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Fri Nov 26, 2010 2:15 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

jordanfang wrote:
Yes, I don't mean delete and redefine a new recv channel object, I mean stop and restart recv channel.

The sequence number is "persistent", is it?
Why did the recv channel lost its before seq num and start from 1 if qmgr is stopped and restart

Did somebody tamper with the SYSTEM.CHANNEL.SYNC.QUEUE?
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
Display posts from previous:   
Post new topic  Reply to topic Goto page 1, 2  Next Page 1 of 2

MQSeries.net Forum Index » General IBM MQ Support » Issue with message sequence number after HA switched
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.