Author |
Message
|
George Carey |
Posted: Tue Oct 13, 2009 4:25 pm Post subject: MQ v 7.0.1 |
|
|
Knight
Joined: 29 Jan 2007 Posts: 500 Location: DC
|
Set up MQ v 7.0.1 qmgr instance failover test as per documentation in latest Info Center :
Verifying the multi-instance queue manager on Linux
It works fine when I do the
endmqm -s QMname for the switch over .
The amqsghac, phac and mhac all reconnect and continue to run fine.
Thought I would simulate a bit more real world crash and did a Kill -9 shutdown of QMGR processes(still orderly using appropriate kill order, also from InfoCenter)
Failover works fine of QMGR and 2 of 3 amqs?hac client programs reconnect just fine but the amqsphac(the put) fails each time with a 2549 error
2549 is a MQRC_CALL_INTERRUPTED !!
I just restart it manually and it all works fine.
Don't even know what that error code is telling me.
Anybody got some ideas ??
TIA _________________ "Truth is ... grasping the virtually unconditioned",
Bernard F. Lonergan S.J.
(from book titled "Insight" subtitled "A Study of Human Understanding") |
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Oct 14, 2009 4:07 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Sounds like you need to open a PMR...  _________________ MQ & Broker admin |
|
Back to top |
|
 |
George Carey |
Posted: Wed Oct 14, 2009 7:42 am Post subject: PmR |
|
|
Knight
Joined: 29 Jan 2007 Posts: 500 Location: DC
|
I was hoping that wasn't the case !
But perhaps so. _________________ "Truth is ... grasping the virtually unconditioned",
Bernard F. Lonergan S.J.
(from book titled "Insight" subtitled "A Study of Human Understanding") |
|
Back to top |
|
 |
bobbee |
Posted: Tue May 04, 2010 9:44 am Post subject: |
|
|
 Knight
Joined: 20 Sep 2001 Posts: 545 Location: Tampa
|
did you ever resolve your amqsphac error? |
|
Back to top |
|
 |
George Carey |
Posted: Tue May 04, 2010 12:49 pm Post subject: multi-instance qmgr testing |
|
|
Knight
Joined: 29 Jan 2007 Posts: 500 Location: DC
|
No .. not really ... there were a lot of things I tried with multi-instance qmgr and issues encountered had Hursley and local support and local SEs saying conflicting things ... things like, I can't even use NFS mounted filesystems for the data/queues (e.g. /var/mqm) first at all then later clarified only if it is mounted as an NFSv4 filesystem mount.
I modified the C code to see if I could pin down where the exact failure was on the 'put' with fail over ... I always seemed to miss one message when sending msg1, msg2 .... msgN ... msgN+m ... a fail over would be executed during these sends and when restarting applications my modified code made the application not take the error exit and instead continue to re-read the queue. When this was done at some point in time(failover time) when I expected msgN+1 I got instead msgN+2 no matter what I did.
I said why does it work with endmqm -s QMGR and not when I kill processes ... clearly -s was saving some key state information !!
I sent the output to support level2 or 3 not sure which was working with me at the time ... the bottom line was some mumbo about it can't recover where it was last because of ????(unclear explanation)??? and working as designed.
This is recalling things months ago now ... so I might be missing some facts/points but this was the gist of it. _________________ "Truth is ... grasping the virtually unconditioned",
Bernard F. Lonergan S.J.
(from book titled "Insight" subtitled "A Study of Human Understanding")
Last edited by George Carey on Tue May 04, 2010 1:17 pm; edited 1 time in total |
|
Back to top |
|
 |
bobbee |
Posted: Tue May 04, 2010 12:51 pm Post subject: |
|
|
 Knight
Joined: 20 Sep 2001 Posts: 545 Location: Tampa
|
|
Back to top |
|
 |
bruce2359 |
Posted: Tue May 04, 2010 1:00 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Quote: |
Thought I would simulate a bit more real world crash and did a Kill -9 shutdown of QMGR processes(still orderly using appropriate kill order, also from InfoCenter) |
I'm not sure if I'd consider plinking qmgr processes a real world test, even if IBM suggests it.
What other scenarios did you try? Did any succeed? Did any (other than kill -9) fail? _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
mvic |
Posted: Tue May 04, 2010 1:09 pm Post subject: Re: multi-instance qmgr testing |
|
|
 Jedi
Joined: 09 Mar 2004 Posts: 2080
|
George Carey wrote: |
I always seemed to miss one message when sending msg1, msg2 .... msgN ... msgN+m ... |
amqsphac.c does MQPUT outside syncpoint (not sure why that is, but it's only a sample program, so you can change it and rebuild). So it can get MQRC_CALL_INTERRUPTED.
But even if you use syncpoint and MQCMIT, you can theoretically get a failure in which the client doesn't know whether the message got committed. If the client doesn't know, this is probably why you get MQRC_CALL_INTERRUPTED.
MQRC_CALL_INTERRUPTED manual page: http://publib.boulder.ibm.com/infocenter/wmqv7/v7r0/topic/com.ibm.mq.amqzao.doc/fm22549_.htm |
|
Back to top |
|
 |
George Carey |
Posted: Tue May 04, 2010 1:37 pm Post subject: transactional |
|
|
Knight
Joined: 29 Jan 2007 Posts: 500 Location: DC
|
yes, I recall now ... that I tried that as well and that did not work as well ... which concerned me ...
And to your point ... that the client doesn't know and thus the 2549 call interrupted.
That was a discussion point. I said, what is it doing different in the endmqm -s and why doesn't it know? ... I believe some one from Hursley responded with something to the effect that the client knowing the committed state of the last transaction could, he thought be determined (he seemed to be doing a little mental brainstorming/coding on how it might be done as he said this... I think this was during a conference call) but currently wasn't ... and that's the way it has been from day one ... and so a kind of, that was that, ... was the way it was left.
Again, take with a grain of salt that these are recollections but I believe are at minimum generically correct ones.
Also I read that link section then multiple times ... I don't recall it clarifying the issue for me.
GTC _________________ "Truth is ... grasping the virtually unconditioned",
Bernard F. Lonergan S.J.
(from book titled "Insight" subtitled "A Study of Human Understanding") |
|
Back to top |
|
 |
mvic |
Posted: Tue May 04, 2010 1:53 pm Post subject: Re: transactional |
|
|
 Jedi
Joined: 09 Mar 2004 Posts: 2080
|
George Carey wrote: |
I believe some one from Hursley responded with something to the effect that the client knowing the committed state of the last transaction could, he thought be determined |
This is a bit unclear. How could the client determine the "committed" state of a transaction if it lost contact with instance 1 of the queue manager without receiving a reply from the commit?
If the MQI had the facility to tell you a transaction ID for your current transaction, then I might believe it. But I am unaware of any such facility. So the client has no "token" it can go to the queue manager with, in order to ask about the success of the transaction identified by that "token".
I think XA offers a facility to return after a crash and make sure everything is solid. But I seem to recall that client reconnects are not supported with XA so that's maybe a non-starter. |
|
Back to top |
|
 |
bruce2359 |
Posted: Tue May 04, 2010 2:14 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Quote: |
This is a bit unclear. How could the client determine the "committed" state of a transaction if it lost contact with instance 1 of the queue manager without receiving a reply from the commit? |
Some rc/cc's come from the client layer - like 2059, for example. It is possible that the rc/cc from the qmgr from a client mq call doesn't make it back to the client layer.
Here, I found the following. Yet another issue with the client.
http://publib.boulder.ibm.com/infocenter/wmqv7/v7r0/index.jsp
2549 (09F5) (RC2549): MQRC_CALL_INTERRUPTED
Explanation
MQPUT, MQPUT1, or MQCMIT was interrupted and reconnection processing cannot reestablish a definite outcome.
The 2549 reason code is returned to a client that is using a reconnectable connection if the connection is broken between sending the request to the queue manager and receiving the response, and if the outcome is not certain. For example, an interrupted MQPUT of a persistent message outside sync point might or might not have stored the message. Alternatively an interrupted MQPUT1 of a persistent message or message with default persistence (which could be persistent) outside sync point might or might not have stored the message. The timing of the failure affects whether the message remains on the queue or not. If MQCMIT was interrupted the transaction might or might not have been committed.
Completion Code
MQCC_FAILED
Programmer response
Repeat the call following reconnection, but be aware that in some cases, repeating the call might be misleading.
The application design determines the appropriate recovery action. In many cases, getting and putting persistent messages inside sync point resolves indeterminate outcomes. Where persistent messages need to be processed outside sync point, it might be necessary to establish whether the interrupted operation succeeded before the interruption and repeating it if it did not. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
George Carey |
Posted: Tue May 04, 2010 2:15 pm Post subject: plinking qmgr processes |
|
|
Knight
Joined: 29 Jan 2007 Posts: 500 Location: DC
|
Bruce if by
Quote: |
plinking qmgr processes |
you mean kill -9 in documented kill order ... no I did not try any other type of test.
I thought this was still a much more orderly shutdown then hitting a power switch but still much more realistic and less contrived than having an endmqm -s command being done as a 'failure' test !!!
I was trying to give my failover test at least a fighting chance of working 100%.
The endmqm -s command could be used as part of a planned maintenance activity but certainly not part of a disaster recovery scenario which is what I was trying to simulate.
Bottom line ... be my guest ... try exactly what I am saying yourself and see what your results are. Setup the multi-instance qmgr test scenario for MQ 7.0.1 ... do the endmqm -s way then do again after doing a more abrupt manner of qmgr failure ... such as I did (kill -9s in for next loop) and let us know if your mileage differs!
GTC _________________ "Truth is ... grasping the virtually unconditioned",
Bernard F. Lonergan S.J.
(from book titled "Insight" subtitled "A Study of Human Understanding") |
|
Back to top |
|
 |
bruce2359 |
Posted: Tue May 04, 2010 2:23 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
I haven't had much opportunity to work with MI...yet. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
George Carey |
Posted: Tue May 04, 2010 2:39 pm Post subject: token |
|
|
Knight
Joined: 29 Jan 2007 Posts: 500 Location: DC
|
mvic said:
Quote: |
... But I am unaware of any such facility. So the client has no "token" it can go to the queue manager with .... |
Ok, sounds correct but how/why does the endmqm -s failover work?
Is this saving a transaction Id or token to be used on restart.
I don't believe any messages are missed/lost on restart when I did the endmqm -s failover.
If so could a token be cached in client for relevant parts of MQI ? _________________ "Truth is ... grasping the virtually unconditioned",
Bernard F. Lonergan S.J.
(from book titled "Insight" subtitled "A Study of Human Understanding") |
|
Back to top |
|
 |
mvic |
Posted: Tue May 04, 2010 2:55 pm Post subject: Re: token |
|
|
 Jedi
Joined: 09 Mar 2004 Posts: 2080
|
George Carey wrote: |
Ok, sounds correct but how/why does the endmqm -s failover work?
Is this saving a transaction Id or token to be used on restart.
I don't believe any messages are missed/lost on restart when I did the endmqm -s failover. |
Probably (but I'm guessing!) it is because the endmqm -s gave a more controlled shutdown than simply ending all qmgr processes, and so the client got all the return codes it needed in the endmqm -s case.
Quote: |
If so could a token be cached in client for relevant parts of MQI ? |
I have not heard of any such token. I don't think the cleaner behaviour on endmqm -s really needs much explaining - it's probably just cleaner and more controlled and so the client gets everything it needs, so doesn't need to return a CALL_INTERRUPTED rc. |
|
Back to top |
|
 |
|