ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » Clustering » Message loss with failover in MQ V7 Cluster

Post new topic  Reply to topic Goto page 1, 2, 3  Next
 Message loss with failover in MQ V7 Cluster « View previous topic :: View next topic » 
Author Message
Wally
PostPosted: Wed Sep 22, 2010 5:19 am    Post subject: Message loss with failover in MQ V7 Cluster Reply with quote

Novice

Joined: 22 Sep 2010
Posts: 15

Hello out there,

- I do run a small test MQ v7 cluster on a single Windows box
- I have 3 QMGRs named T1, T2 and S1 listening on ports 1501, 1502 and 1503
- T1 and T2 are full repos and S1 partitial repository
- T1 and T1 have a local queue named TARGETQ which is shared on the cluster named CLUST1 (BindType is Not Fixed and persistant)
- S1 has an alias queue SENDQ on TARGETQ
- Setup is created via MQ Explorer first T1/T2 cluster then add S1

Plan is to send messages to S1 and process them on T1 or T2.

When I send messages to S1.SENDQ the messages are happily distributed across T1 and T2, but when I stop T1 or T2 and again try to send message always the first message based on Round-Robin to the off-line QMGR is lost and all subsequent messages appear on-line QMGR. So if I send messages M1, M2 and M3 the first message M1 going to the now off-line member disappears completely without any error or warning.

Even if the stopped QMGR is started again the message is lost. I tried to send the message with the MQ Explorer as well with a simple JMS Client program and explicit setting message persistence.

According to the documentation this should work. Does anyone has had the same strange issue or can provide me with some example config I can compare to mine. Any help appreciated!

Please
Back to top
View user's profile Send private message
Vitor
PostPosted: Wed Sep 22, 2010 5:50 am    Post subject: Re: Message loss with failover in MQ V7 Cluster Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

Wally wrote:
when I stop T1 or T2 and again try to send message always the first message based on Round-Robin to the off-line QMGR is lost and all subsequent messages appear on-line QMGR. So if I send messages M1, M2 and M3 the first message M1 going to the now off-line member disappears completely without any error or warning.


I'd expect the message to be in the SCTQ rather than lost. Does the message have expiry set?

Wally wrote:
According to the documentation this should work.


It should work within limits. There have been a number of discussions on using a WMQ cluster for this kind of HA solution and why it doesn't work all that well; the "stuck messsage" problem.

(Your M1 should be stuck in the SCTQ rather than lost).
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
Wally
PostPosted: Wed Sep 22, 2010 6:01 am    Post subject: Re: Message loss with failover in MQ V7 Cluster Reply with quote

Novice

Joined: 22 Sep 2010
Posts: 15

Quote:

I'd expect the message to be in the SCTQ rather than lost. Does the message have expiry set?

It should work within limits. There have been a number of discussions on using a WMQ cluster for this kind of HA solution and why it doesn't work all that well; the "stuck messsage" problem.

(Your M1 should be stuck in the SCTQ rather than lost).


I would also expect to see the message in the TX or a DL queue, but I can't see any message there. Can you please provide me with some info material on the HA discussion.

The only thing i observe is the error log of S1
Code:

-------------------------------------------------------------------------------
9/22/2010 15:00:43 - Process(30492.15) User(_xyz) Program(amqrmppa.exe)
                    Host(abc)
AMQ9202: Remote host 'abc(xx.xxx.xxx.xx) (1502)' not available, retry
later.

EXPLANATION:
The attempt to allocate a conversation using TCP/IP to host 'abc
(xx.xxx.xxx.xx) (1502)' was not successful.  However the error may be a
transitory one and it may be possible to successfully allocate a TCP/IP
conversation later.
ACTION:
Try the connection again later. If the failure persists, record the error
values and contact your systems administrator. The return code from TCP/IP is
10061 (X'274D'). The reason for the failure may be that this host cannot reach
the destination host. It may also be possible that the listening program at
host 'abc (xx.xxx.xxx.xx) (1502)' was not running.  If this is the case,
perform the relevant operations to start the TCP/IP listening program, and try
again.
----- amqccita.c : 1289 -------------------------------------------------------
9/22/2010 15:00:43 - Process(30492.15) User(_xyz) Program(amqrmppa.exe)
                    Host(abc)
AMQ9999: Channel program ended abnormally.

EXPLANATION:
Channel program 'TO.T2' ended abnormally.
ACTION:
Look at previous error messages for channel program 'TO.T2' in the error files
to determine the cause of the failure.
----- amqrccca.c : 921 --------------------------------------------------------
[/quote]
Back to top
View user's profile Send private message
exerk
PostPosted: Wed Sep 22, 2010 6:07 am    Post subject: Re: Message loss with failover in MQ V7 Cluster Reply with quote

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

Wally wrote:
...I would also expect to see the message in the TX or a DL queue, but I can't see any message there...


Stop all your channels from the queue manager from where you 'sent' the message, then check the S.C.T.Q
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.
Back to top
View user's profile Send private message
mqjeff
PostPosted: Wed Sep 22, 2010 6:08 am    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

If the message is not persistent, this is not unexpected.

If the channel is not fully stopped, then the message could be in transaction.
Back to top
View user's profile Send private message
bruce2359
PostPosted: Wed Sep 22, 2010 6:09 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9469
Location: US: west coast, almost. Otherwise, enroute.

Quote:
I would also expect to see the message in the TX or a DL queue, but I can't see any message there.

Exactly which queues did you look into? The SCTQ is not named TX.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
Wally
PostPosted: Wed Sep 22, 2010 6:17 am    Post subject: Reply with quote

Novice

Joined: 22 Sep 2010
Posts: 15

So I had a look into the S1.SYSTEM.CLUSTER.TRANSMIT.QUEUE, but saying this it looks like this is by default non-persistant - will modify this to persistant and run my test again. The message should be persistant as I also use this code to send it

Code:

jmsTemplate.send(new MessageCreator() {
    public Message createMessage(Session session) throws JMSException {

        TextMessage msg = session.createTextMessage(message);
        msg.setJMSDeliveryMode(DeliveryMode.PERSISTENT);

        return msg;
    }
});
Back to top
View user's profile Send private message
Vitor
PostPosted: Wed Sep 22, 2010 6:23 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

Wally wrote:
it looks like this is by default non-persistant - will modify this to persistant


That's only a default setting, as has been said many, many times here.

Wally wrote:
Code:

jmsTemplate.send(new MessageCreator() {
    public Message createMessage(Session session) throws JMSException {

        TextMessage msg = session.createTextMessage(message);
        msg.setJMSDeliveryMode(DeliveryMode.PERSISTENT);

        return msg;
    }
});


This should (my JMS is weak) override that setting
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
Vitor
PostPosted: Wed Sep 22, 2010 6:24 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

mqjeff wrote:
If the channel is not fully stopped, then the message could be in transaction.




Check the count on the queue
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
Wally
PostPosted: Wed Sep 22, 2010 6:33 am    Post subject: Reply with quote

Novice

Joined: 22 Sep 2010
Posts: 15

So when I run the "failover test" I stop the complete qmgr T1; wait a time and then again try to send messages.

Changing the System.Cluster.Transmit.Queue to persistant does help either.

So when I stop both sender channels to the rest of the cluster the messages sit in the System.Cluster.Transmit.Queue and wait. Starting the channel to the off-line member doesn't change the situation and starting the second channel again will transfer ALL of the message to the on-line member.

But having a failure of one node will not stop the sender channels before - no?
Back to top
View user's profile Send private message
bruce2359
PostPosted: Wed Sep 22, 2010 6:35 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9469
Location: US: west coast, almost. Otherwise, enroute.

Quote:
So I had a look into the S1.SYSTEM.CLUSTER.TRANSMIT.QUEUE, but saying this it looks like this is by default non-persistant - will modify this to persistant and run my test again.


First, queues are neither persistent nor non-persistent - messages are.

Is the SCTQ really named S1.SYSTEM.CLUSTER.TRANSMIT.QUEUE?

What queue does your application open?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
Wally
PostPosted: Wed Sep 22, 2010 6:41 am    Post subject: Reply with quote

Novice

Joined: 22 Sep 2010
Posts: 15

bruce2359 wrote:

First, queues are neither persistent nor non-persistent - messages are.

Is the SCTQ really named S1.SYSTEM.CLUSTER.TRANSMIT.QUEUE?

What queue does your application open?


Sorry for the confusion I meant SYSTEM.CLUSTER.TRANSMIT.QUEUE on my qmgr S1.

So I have 2 clustered queues on T1 and T2 named TARGETQ and my littel sample app or the MQ Explorer send its test message to the TARGETQ at S1.
Back to top
View user's profile Send private message
Vitor
PostPosted: Wed Sep 22, 2010 6:46 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

Wally wrote:
So when I run the "failover test" I stop the complete qmgr T1; wait a time and then again try to send messages.


How long a time? It will take a discussed and documented period for the channels to notice the failure.

Changing the System.Cluster.Transmit.Queue to persistant does help either.

Wally wrote:
So when I stop both sender channels to the rest of the cluster the messages sit in the System.Cluster.Transmit.Queue and wait. Starting the channel to the off-line member doesn't change the situation


By which you mean the 1st message (which you've seen by browsing on the SCTQ) does not arrive on the on-line queue manager but the other do?

Wally wrote:
and starting the second channel again will transfer ALL of the message to the on-line member.


So in one scenario you get all the messages, in the other you get all-1?

Wally wrote:
But having a failure of one node will not stop the sender channels before - no?


It will put the channel to the downed queue manager (don't call it a node - this isn't an HA situation) into retry after a period.
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
Wally
PostPosted: Wed Sep 22, 2010 6:57 am    Post subject: Reply with quote

Novice

Joined: 22 Sep 2010
Posts: 15

Vitor wrote:

How long a time? It will take a discussed and documented period for the channels to notice the failure.


I wait like 30 seconds, but at least the time until the Explorer come back with the message stopped.

Vitor wrote:

Changing the System.Cluster.Transmit.Queue to persistant does help either.


I have already changed it to be persistant but still same behaviour.

Vitor wrote:

By which you mean the 1st message (which you've seen by browsing on the SCTQ) does not arrive on the on-line queue manager but the other do?

So in one scenario you get all the messages, in the other you get all-1?


So when I have stopped the channels before sending out messages again I can see all my messages in the S.C.T.Q on S1 and when starting up the channels all messages are transfered to the on-line qmgr.

Wheras in the scenario when I only bring on qmgr off-line and send messages the first message targeted to the off-line qmgr (according to the round-robin algorithm) is lost (like me now).
Back to top
View user's profile Send private message
Vitor
PostPosted: Wed Sep 22, 2010 7:23 am    Post subject: Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

Wally wrote:
Wheras in the scenario when I only bring on qmgr off-line and send messages the first message targeted to the off-line qmgr (according to the round-robin algorithm) is lost (like me now).


How do you mean "targeted"? If a message is addressed to a given queue manager it bypasses the cluster workload distribution.

So if you have 3 messages browsable in the SCTQ and bring one of the queue managers on line what happens?

If a message (M1) isn't sent to the on-line queue manager but M2 & M3 are, what happens if you then bring the other queue manager on-line?

Are you certain expiry isn't in use?
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Goto page 1, 2, 3  Next Page 1 of 3

MQSeries.net Forum Index » Clustering » Message loss with failover in MQ V7 Cluster
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.