MQSeries.net :: View topic - SUSPEND RESUME Cluster queue manager with changed IP address

monkeydluffy · Posted: Fri Feb 12, 2016 5:25 am Post subject:

Hi everybody,

Firstly I am very sorry for a long post. We have been facing an MQ cluster issue and thought it would be good to share my experience here and also get expert opinion on this.

We have a scheduled PROD to DR fail-over testing next week. We were trying to test the MQ scripts by simulating within our UAT and a clone environment.
The setup is as follows:
EAI QMGR: QM1 (Full repository)
Backend Application 2 QMGRs: QM2 (FR) and QM3 (PR)

These three queue managers are in a cluster MYCLUS1.
In the PROD to DR activity, only App server and MQ server will be cloned to DR server. EAI will remain in the PROD environment.
So during this simulation, we kept the EAI server in UAT and tried to migrate the App queue managers to a UAT clone server (via VMWare copy of the UAT image)
So this would require the cloned app queue managers to be in the cluster instead of the UAT queue managers.

Before testing: QM1 (UAT, FR), QM2 (UAT, FR) and QM3 (UAT, PR)
During testing: QM1 (UAT, FR), QM2 (UAT clone, FR) and QM3 (UAT clone, PR)
As mentioned, app will be cloned from PROD to DR environment; it is a VMware copy of the image.
So QM2 and QM3 will now have different IP address but they are still the same queue manager with the same QMID also.
For this reason, we are suspending the App queue managers, and resuming them from the new IP address. And we are also changing the IP address of the cluster sender and receiver channels across all 3 queue managers.

Say the IP address of the queue managers:
EAI is in 172.30.9.1
QM2 and QM3 UAT IP address is 172.30.9.2 (port 1415 and 1416 respectively)
UAT clone IP address is 172.30.9.3. Since this is VMware copy, queue manager name and port remain the same.
We followed the steps as detailed in:
WebSphere MQ 7.5.0>WebSphere MQ>Configuring>Configuring a queue manager cluster>Managing WebSphere MQ clusters>Maintaining a queue manager

What we did were:
1. SUSPEND QMGR on QM2 UAT. Once MQ error log says successfully processed, we ended the Queue manager.
2. SUSPEND QMGR on QM3 UAT. Once MQ error log says successfully processed, we ended the Queue manager.
3. ALTER EAI CLUSSDR IP address to point to 172.30.9.3 (app UAT clone IP address) replacing the existing 172.30.9.2(app UAT IP address)
4. Start the queue manager QM2 in UAT clone server.
5. Modified the IP address of QM2 CLUSRCVR channel to 172.30.9.3. We did not modify the CLUSSDR IP address to EAI since EAI IP address is unchanged.
6. Start the queue manager QM2 in UAT clone server.
7. Modified the IP address of QM3 CLUSRCVR channel to 172.30.9.3.
8. Modified the IP address of QM3 CLUSSDR channel for QM2 to 172.30.9.3.
9. We did not modify the CLUSSDR IP address to EAI since EAI IP address is unchanged.

What we then see is that CLUSSDR channel from EAI to QM2 is retrying, CLUSSDR from QM2 to QM3 is in retrying, auto â€“cluster sender from QM2 to QM3 is retrying, auto-cluster sender from EAI to QM3 is retrying.
Upon investigation, we see the all IP address changes took effect in MQ explorer, but in error log it was still pointing to previous IP addresses. Even in the cluster section of MQ explorer we see old IP address but in queue manager section, we see new IP address.
Then we issued SUSPEND Queue manager in force mode in the UAT app queue managers (started them, forcibly suspended them, then stopped them) which did not solve anything. Then we issue REFRESH CLUSTER command in all three queue managers (EAI and clone UAT), and problem was resolved.
I understand issuing REFRESH CLUSTER should be done only in exceptional circumstances but this was UAT so we proceeded anyways but we cannot do so in PRODUCTION environment unless advised. During our live PROD to DR fail-over do we need to follow the REFRESH CLUSTER approach?

I am trying to figure out a cleaner way to do this keeping in mind the client environment constraints. I will try to put my findings also soon.

I am relatively new to cluster and have been going through the infocenter for clusters section. Have a query which I could not not find a answer on infocenter. May be it is subtly mentioned somewhere which I could have overlooked.