|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
 |
|
Controlling QMGR failover during network outage |
« View previous topic :: View next topic » |
Author |
Message
|
Abhi |
Posted: Sun Apr 25, 2021 5:42 pm Post subject: Controlling QMGR failover during network outage |
|
|
Novice
Joined: 10 Mar 2011 Posts: 19
|
Hi,
We do have HA qmgr setup where MQ data folders are mounted to remote nfs shares. During network outages (even when it's for less than 5 seconds) i.e when we don't have access to the nfs share our qmgrs would failover when connection comes back. The failover is not expected since during outages neither Active nor passive host would have access to remote nfs share. While Analysing logs I fail to understand below points and looking for any help explaining:
Quote: |
The failover behaviour is random i.e not all qmgrs failover and not the same queue manager failover every time. |
Quote: |
When connection recovers post these outages the queue manager first recover as normal i.e active comes back as active and standby comes back as standby. Seconds later a FFST file is generated (detais below) on the active node and a minute later qmgr fails over. |
Quote: |
The point when nfs access is gone and qmgr tries to recover, is there a policy which controls which node will get the qmgr lock when connection comes back or is it random? |
Quote: |
What happens if the outage is more than the time specified for FileLockHeartBeatLen? |
Quote: |
Can this behaviour be controlled using any scripts, like stop standby instances during outages making it non HA and then start back stand by instance when things get back normal? |
FFST Details:
Code: |
Probe Id :- HL206037 |
Application Name :- MQM |
Component :- mqloWritevFile
Effective UserID :- 1500 (mqm) |
Real UserID :- 1500 (mqm) |
Program Name :- amqzmuc0 |
Arguments :- -m QMGR |
Addressing mode :- 64-bit
Major Errorcode :- xecF_E_UNEXPECTED_RC |
Minor Errorcode :- hrcE_MQLO_DERR |
Probe Type :- MSGAMQ6118 |
Probe Severity :- 1 |
Probe Description :- AMQ6118S: An internal IBM MQ error has occurred |
(20806826)
|
IBM MQ Version: 9.1.3.0
Platform: Rhel
Regards,
Abhi |
|
Back to top |
|
 |
bruce2359 |
Posted: Sun Apr 25, 2021 6:55 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Next time, please post more of the errors logged.
Was a FDC created for this event?
Did you research error message AMQ6118S? Did you open a PMR with IBM?
Other than posting here, what did you do? _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
Andyh |
Posted: Tue Apr 27, 2021 12:43 pm Post subject: |
|
|
Master
Joined: 29 Jul 2010 Posts: 239
|
mqloWritevFile is the MQ function that writes to the recovery log.
hrcE_MQLO_DERR is the error reported when an EIO error is returned to this function.
The QMgr doesn't attempt to handle this error and will end abruptly after receiving such an error, after an EIO error the QMgr doesn't know if some of the requested write data made it to disk or not. The speed with which the QMgr terminates abruptly following such an error is much improved in MQ 9.2.1.
You might like to check the NFS mount options for the file systems hosting the MQ data are correct, for example using a hard mount. |
|
Back to top |
|
 |
|
|
 |
|
Page 1 of 1 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|