Author |
Message
|
offshore |
Posted: Mon Jun 25, 2007 9:43 am Post subject: High Availability/Failover |
|
|
 Master
Joined: 20 Jun 2002 Posts: 222
|
All,
I'm not sure where or even how to ask the question, but here goes my best shot. (I figure more details will be needed, but I'll start here.)
Is there a way during a non-anticipated failover/outage to not have messages stuck in the xmitq?
Currently, if we do a planned shutdown, everything goes smooth. End-user's have no idea we switched systems.
The next step in the cycle of things it to make so everything at least to the end-user go smooth if there is an unplanned loss of service. (Communication outage, IP Stack goes down, z/OS QMGR's gets message up...make up your own scenerio)
I was testing with a script that was putting several messages a second on the distrib side and I cancelled one of the z/OS QMGR's address space. By the time the distributed side noticed that QMZ1 was no longer available, there were messages in the xmitq that couldn't get sent.
Configuration:
LPAR1 LPAR2
QMZ1 QMZ2
W2K_Svr(s) (multiple distributed QMGRS)
QMWIN1
QMWIN2
QMWIN3
*The z/OS QMGR's are in a QSG & all QMGR's are clustered, with the 2 z/OS QMGR's being full repositories.
I've been reading through the WebSphere MQ in a z/OS Parallel Sysplex Environment redbook. In CH 11 (High Availability Setup) there seems to be only 1 option that is truely failover friendly, by that I mean no messages remaing in xmitq. That would be scenerio 1, where you move a gateway qmgr from 1 lpar to another, along with associated IP's. But of course the downside to that is there is some downtime to get everything moved to the good lpar. This just isn't an option for our shop....our current setup works better. |
|
Back to top |
|
 |
fjb_saper |
Posted: Mon Jun 25, 2007 1:59 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
In that case you are truely doing load distribution and not HA... ... but then you already knew that...  _________________ MQ & Broker admin |
|
Back to top |
|
 |
offshore |
Posted: Tue Jun 26, 2007 2:39 am Post subject: |
|
|
 Master
Joined: 20 Jun 2002 Posts: 222
|
I guess what I'm getting at, it doesn't seem you can do seamless HA between 2 different platforms.
Maybe it is because our shop is using MQ synchronously? But still, it seems to me that there should be a way to deal with this problem.
All the messages have to get to the backend (ie z/OS side) to get processed. If they do not get processed withing a certain time, the end-user will get an error message. To me it isn't that big of deal, because the chances are slim that an un-planned outage will happen on the z-series side, but.......
The thing is if a user happens to be purchasing something and something happens to where the message gets "stuck" then the user will get an error. They may try again or quit and come back later. But the down side is, with a message still "waiting" to be sent once that z-series QMGR comes back online the messsage will be sent and the user will have purchased it and by that time may have come back and bought what they wanted resulting in a duplicate purchase. You can't have that...
I'm look for help with solutions/recommendations from people in the field that work with the product day in & day out....I know clustering isn't for HA it's for reduced administration...blah blah blah. Anyone that has been on this board long enough should know how much that gets beat to death. |
|
Back to top |
|
 |
Vitor |
Posted: Tue Jun 26, 2007 2:48 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
offshore wrote: |
All the messages have to get to the backend (ie z/OS side) to get processed. If they do not get processed withing a certain time, the end-user will get an error message. To me it isn't that big of deal, because the chances are slim that an un-planned outage will happen on the z-series side, but.......
|
If the backend is z/OS, could you use shared queues to achieve this? IIRC if one queue manager crashes the messages on the queue are available to the shared partners including (again with the IIRC) the one in-flight at the time of the crash. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
offshore |
Posted: Tue Jun 26, 2007 4:04 am Post subject: |
|
|
 Master
Joined: 20 Jun 2002 Posts: 222
|
Vitor,
Yes, the z/OS side works great with shared queues. I don't seem to have an issue there.
The problem is the message getting stuck on the xmitq on the distrib side. When those messages were"in-flight", the distrib qmgr thought the backed was still available, when in reality it wasn't.
So when you look at the message header (in the xmitq), it still has the destination to the z/OS backend queue manager that is unavailable (that failed). So that message will stay there (assuming you don't have an expiration on it) until that specific queue manager is available again. Then the message will be sent and the transaction processed on the back end. Which could be a bad thing for an end-user(customer)....if it was a purchase of an item in contrast to request just to get information back.
With the QSG (shared-queues) on the backend if the message made it there, then any available queue manger in the QSG will process it.
That's where I'm getting at is it "truely" possible to have seemless failover/high availability between 2 platforms. As I mentioned with a controlled shutdown of the back end, all messages get rerouted, because the backend (z/OS) qmgr notifies that he is stopping an not taking anymore requests. With a "failure" there really isn't a notification, so it's up to the sending qmgr to detect that the qmgr isn't there anymore...and by the time it does determine that several messages, have already been placed on the xmitq destined for a a qmgr that is no longer up...Does that make sense????? |
|
Back to top |
|
 |
Vitor |
Posted: Tue Jun 26, 2007 4:17 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
offshore wrote: |
With a "failure" there really isn't a notification, so it's up to the sending qmgr to detect that the qmgr isn't there anymore...and by the time it does determine that several messages, have already been placed on the xmitq destined for a a qmgr that is no longer up...Does that make sense????? |
What you need is for the backend managers to be clustered, use cluster technology to ensure that messages are not routed to the downed qmgr and a QSG to ensure that it's remaining messages are processed. According to the Clusters manual:
Quote: |
You can define a cluster queue that is also a shared queue. For example on z/OS you can define: DEFINE QLOCAL(MYQUEUE) CLUSTER(MYCLUSTER) QSGDISP(SHARED) CFSTRUCT(STRUCTURE) |
so it ought to work. Never tried it myself (old school MVS type, I refuse to believe mainframe stuff fails!) but ought to work.
I call upon other, wiser minds to comment. Or at least minds with more recent z/OS experience!  _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
offshore |
Posted: Tue Jun 26, 2007 6:31 am Post subject: |
|
|
 Master
Joined: 20 Jun 2002 Posts: 222
|
Vitor,
Quote: |
so it ought to work. Never tried it myself (old school MVS type, I refuse to believe mainframe stuff fails!) but ought to work. |
I'm with you on that!!!
I do have the backend (z/OS) qmgr's clustered and in a QSG. I have also defined the cluster q's as shared. My distrib qmgr's are also clustered. I guess like you, I thought this would be the solution to my problem.
So, that is why I sorta changed the question to "Is it possible to have HA between 2 different platforms." And of course it's difficult to explain sometimes what your environment is and that in turn makes it harder for other's to answer the question(s).
Below, I'll try to tell (the best of my ability) what's going on. The internet needs a whiteboard or paint program to put into message boards.
CLUSTERNAME: MQCLUST1
QMGRS/PLATFORM/CLUSTER
QMZ1/ZOS/MQCLUST1
QMZ2/ZOS/MQCLUST1
-both are on separate lpars, and in a QSG name MQVG
QMWIN1/W2K/MQCLUST1
-MC76 support pac installed to give priority to QMZ1.
On the z/OS we have a shared cluster q called BUY.ITEM
Now on the distrib side a user comes in, and some code is executed that needs to goto q BUY.ITEM
The W2K qmgr QMWIN1 requests the q BUY.ITEM and the MF qmgrs QMZ1 & QMZ2 send back saying we have that queue. So the message is put on the SYSYTEM.CLUSTER.TRANSMIT.QUEUE cluster-sndr channel TO.QMZ1 (10.10.10.50 1414) starts and the message makes it to the back end where it's processed, the end-user gets confirmation (via web browser and eventually email) that they have purchased something.
Now on a controlled shut down - same thing, except QMZ2 now takes over and cluster-sndr channel TO.QMZ2 (10.10.10.60 1414) starts and messages goto that lpar where it's processed and the user had no idea there was an planned/controlled failover.
Here is where the problem arises (in the very rare instance), but let's say QMZ1 get's hosed, TCP/IP stack fails, LPAR1 gets hosed..what ever, at any rate QMZ1 is no longer responding. So the front end program on QMWIN1 is requesting q BUY.ITEM, but still thinks QMZ1 can handle the request due to an unclean failure. So a message is put on the SYSTEM.CLUSTER.TRANSMIT.QUEUE.
Inside that message is XQH and within that the qmgr destination is QMZ1.
Well, because QMZ1 is no longer taking requested and wasn't able to notify QMWIN1 it takes that distrib qmgr longer to "realize" that QMZ1 is no longer available before sending messages to QMZ2.
Depending on the load there could be several message waiting to be sent to QMZ1. Because of that the user doesn't know if they bought the item or not, because they get an error (on the web browser) because the message didnt' make it to the destination and wasn't processed in a timely manner.
Because of this a few things happen:
The user tries again, buys the item and everything works just fine because in the seconds it for the distrib qmgr to realize the QMZ1 isn't available, the traffic is redirected to QMZ2.
-But the 1st time the user tried, that message is in the SYSTEM.CLUSTER.TRANSMIT.QUEUE with the message bound for QMZ1
Once QMZ1 becomes available the message is sent, processed and user bought the item "again" without even realizing it.
So, that goes back to my original question can there truely be HA between different platforms, without the end user knowing? At least in the type of environment were are in, where the user is expecting a response back. Or is there a better way to set up our MQ environment? |
|
Back to top |
|
 |
Vitor |
Posted: Tue Jun 26, 2007 6:36 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
offshore wrote: |
Or is there a better way to set up our MQ environment? |
Message exipry. If the message isn't delivered to it's target by the time the end-user would have timed out, get it to self destruct. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
offshore |
Posted: Tue Jun 26, 2007 6:46 am Post subject: |
|
|
 Master
Joined: 20 Jun 2002 Posts: 222
|
Vitor,
Definatley have thought of the route. I was hoping there was something MQ Administratish (if thats a word) that I could do.
Getting applications to change things..is not always the easiest things to accomplish. The second, the user still gets a "error" page if you will. I was hoping there was something nice and clean like when there is a controlled shutdown. But in a imperfect world you can always have it the way you want...lol
It certainly remedies the problem with "duplicate" purchases. I've just racked my brain and can't find a seemless way to deal with an unplanned failover.
Thanks bro.... |
|
Back to top |
|
 |
Vitor |
Posted: Tue Jun 26, 2007 6:54 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
offshore wrote: |
It certainly remedies the problem with "duplicate" purchases. I've just racked my brain and can't find a seemless way to deal with an unplanned failover.
|
For what it's worth, as a consumer I'd prefer to swear at the site and press the submit button again to order my left handled waffle iron than swear at the site, press the submit button again and receive 2 left handled waffle irons.  _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
jefflowrey |
Posted: Tue Jun 26, 2007 8:37 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
Do left-handed waffles taste better than right-handed waffles? _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Tue Jun 26, 2007 11:25 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Offshore, if you can set up a Queue Sharing Group, and are using Shared queues, just take the next step and use Shared Channels as well. The various LPARs all listen on a generic port. If any one LPAR dies, incoming channels can be serviced by the other remaining LPARs.
I have not played with this yet, but it is my understanding that this would solve your problem. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
Vitor |
Posted: Tue Jun 26, 2007 12:01 pm Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
jefflowrey wrote: |
Do left-handed waffles taste better than right-handed waffles? |
Oh yes. If you've ever tried them you'll know. You'll just know.  _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
offshore |
Posted: Thu Jun 28, 2007 5:13 am Post subject: |
|
|
 Master
Joined: 20 Jun 2002 Posts: 222
|
Can you only get those left-handed waffle irons in the UK? Those aren't offered in the states.
Anyways, I would look at shared channels, but that means breaking the cluster, then creating an addition cluster for the distrib side & having alot more administrative overhead. I think I'll just look into getting the applications people to change their code for the purchase requests. It would be a rare event to have an unplanned outage so....
Thanks for all the help & recommendations.
As a side note: Anyone find it strange that there is an option to make a cluster channel shared, yet you can't actually do it? |
|
Back to top |
|
 |
PeterPotkay |
Posted: Thu Jun 28, 2007 6:53 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
offshore wrote: |
Anyways, I would look at shared channels, but that means breaking the cluster, then creating an addition cluster for the distrib side & having alot more administrative overhead. |
I think one of us is misunderstanding the other. Or both at the same time!
You have 2 z/OS QMs. They are in a Queue Sharing Group.
You have a Windows server with a regular SNDR channel going to one of these z/OS QMs. The Windows QM may or may not be in a cluster. The z/OS QMs are not in that cluster.
Is this your architecture? _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
|