|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
MQ Pub/Sub problem |
« View previous topic :: View next topic » |
Author |
Message
|
RogerLacroix |
Posted: Tue Jul 19, 2005 6:18 pm Post subject: MQ Pub/Sub problem |
|
|
 Jedi Knight
Joined: 15 May 2001 Posts: 3264 Location: London, ON Canada
|
All,
At a client site they are starting to use MQ Pub/Sub.
Environment:
- Solaris 8
- WMQ v5.3 CSD08 (note: CSD08 includes Pub/Sub libraries for those who don't know)
I setup 2 queue managers for 2 applications on a DEV server (single server). Everything was going alone fine (except for the usual app dev problems), so we decided to test it on the LAB MQ server. The LAB MQ server is setup with Veritas with 2 Solaris servers in an Active / Passive design.
So, I created the queue managers on the LAB server with hacrtmqm, setup the pub/sub queues and then configured the failover support in Veritas just like I have done many, many times before. The only new item added was an entry under applications for each broker. I enable the queue managers and started them and everything was fine.
The application connected and did some testing. Next I failed everything over to the passive box. Everything started including the new queue managers except for the brokers. After a minute, Veritas reports that the 2 brokers have faulted.
I can connect to all queue managers including the new ones and get/put messages. From the shell, I typed 'dspmqbrk -m QMgrName' and it said it is not active. So, I figured I would manually start the broker. When I try to start the broker it said 'not active'. I swear, I'm not kidding. So, I typed it really slowing a second time and it again said 'not active'. (What I said next, I will not repeat here.)
Here are the commands & output from the shell:
Code: |
mqm@xxxxxxxxx:/export/home/mqm> dspmqbrk -m QMGRNAME
WebSphere MQ Publish/Subscribe broker for queue manager QMGRNAME not active.
mqm@xxxxxxxxx:/export/home/mqm> strmqbrk -m QMGRNAME
WebSphere MQ Publish/Subscribe broker for queue manager QMGRNAME not active. |
Now for the kicker, I cleared the 2 faults in Veritas and failed it back to the first server and EVERYTHING started including the 2 brokers!!!! So, I failed it again to the second server and everything started, queue mangers, listeners, commands servers for each queue manager, but the 2 broker would not start!!! I cleared the 2 faults in Veritas and failed it back to the first server and EVERYTHING started including the 2 brokers!!
What in the world is up? The 2 servers are exactly the same, same OS, same kernel patch level and same WMQ 5.3 with the same CSD08.
Anybody got ANY ideas what is going on?
Regards,
Roger Lacroix
Capitalware Inc. _________________ Capitalware: Transforming tomorrow into today.
Connected to MQ!
Twitter |
|
Back to top |
|
 |
fjb_saper |
Posted: Tue Jul 19, 2005 9:38 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Could it be that the primary broker has some system (broker) queues opened in exclusive mode ?
As long as the broker on the first failed over machine is not shut down there is possibly little chance of starting the broker on the failover machine.
Remember even though we are talking about 2 different qmgrs they still share a number of things the queue file system not being the minor part.
I suppose that on both boxes the broker user was in the mqm group.
Enjoy  |
|
Back to top |
|
 |
RogerLacroix |
Posted: Wed Jul 20, 2005 8:10 am Post subject: |
|
|
 Jedi Knight
Joined: 15 May 2001 Posts: 3264 Location: London, ON Canada
|
Hi,
You are confusing MQ clustering with hardware clustering.
Just to clarify a few points:
- When I said clustering, I meant hardware clustering and not MQ clustering
- When I said broker, I did not mean MQSI stuff. CSD08 now includes the old SupportPac MA0C - hence I meant that broker.
I even deleted the broker on the first server, failed it over and deleted the broker on the second server and tried to start the broker. No luck, same error message.
There are 6 queue managers on box #1 setup as a hardware cluster. During a failover Veritas unmounts and then mounts the following disk partitions between box #1 and box #2
/var/mqm/data
/var/mqm/log
/var/mqm/errors
Yes, there are 6 queue managers on box #1 but during a failover, the 6 QMs are stopped (including the brokers) then the EMC disks are mounted on the other server (box #2) then everything is started.
Therefore, it IS the same queue managers / brokers that are being stopped and started.
Regards,
Roger Lacroxi
Capitalware Inc. _________________ Capitalware: Transforming tomorrow into today.
Connected to MQ!
Twitter |
|
Back to top |
|
 |
jefflowrey |
Posted: Wed Jul 20, 2005 8:13 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
RogerLacroix wrote: |
You are confusing MQ clustering with hardware clustering. |
I don't think he was. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
RogerLacroix |
Posted: Wed Jul 20, 2005 8:27 am Post subject: |
|
|
 Jedi Knight
Joined: 15 May 2001 Posts: 3264 Location: London, ON Canada
|
Hi Jeff,
I believe he was:
Quote: |
Remember even though we are talking about 2 different qmgrs |
There is only ONE queue manager. Actually, the box has 6 queue managers. QM1, QM2, QM3, QM4, QM5 & QM6. QM5 and QM6 have Pub/Sub enabled and each have their own broker running.
During a failover, Veritas stops things in the following order:
- brokers
- monitoring tool
- Command servers,
- listeners
- queue managers
- unmounts 3 disks
- removes the 3 volumes
- removes the VIP (virtual IP)
Once completed then on the other box it does:
- adds the VIP
- adds the 3 volumes
- mounts the 3 disks
- starts QM1, QM2, QM3, QM4, QM5 & QM6
- starts all command servers
- starts all listeners
- starts monitor tool
- starts brokers
Quote: |
I suppose that on both boxes the broker user was in the mqm group. |
It is the broker that is included with MQ v5.3 CSD08.
i.e.
/opt/mqm/bin/strmqbrk
/opt/mqm/bin/dspmqbrk
/opt/mqm/bin/endmqbrk
And yes, everything is started & stopped using the mqm account and it IS in the mqm group on both boxes.
Regards,
Roger Lacroix
Capitalware Inc. _________________ Capitalware: Transforming tomorrow into today.
Connected to MQ!
Twitter |
|
Back to top |
|
 |
jefflowrey |
Posted: Wed Jul 20, 2005 9:00 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
I think fjb_saper was suggesting that the broker process may not have completely shut down, and still had locks open on the qmgr files - and that therefore when the "same" or "other" (depending on perspective, really) qmgr tried to start up, it couldn't acquire the right access to "it's own" files.
This doesn't seem likely given your procedure, though.
So, umm, did you check for errors when the second broker reported it couldn't start up?  _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
RogerLacroix |
Posted: Wed Jul 20, 2005 9:13 am Post subject: |
|
|
 Jedi Knight
Joined: 15 May 2001 Posts: 3264 Location: London, ON Canada
|
Hi,
Veritas is really good at keeping track of items when it is starting or shutdown programs. It would not go to the next step until the current step is down. Otherwise it throws a fault and stops.
There are no FDC in either /var/mqm/errors/ or /var/mqm/qmgrs/QMGRNAME/errors/
When I issue the strmqbrk command manually on the second box it always gives the following error:
Code: |
WebSphere MQ Publish/Subscribe broker for queue manager QMGRNAME not active. |
Of course it is not active, that is why I am trying to start it!!! Stupid messages!
Here are the commands I just did manually on the second box:
Code: |
mqm@lab02:/export/home/mqm> strmqbrk -m QMGRNAME
WebSphere MQ Publish/Subscribe broker for queue manager QMGRNAME not active.
mqm@lab02:/export/home/mqm> dltmqbrk -m QMGRNAME
WebSphere MQ Publish/Subscribe broker (QMGRNAME) deleted.
mqm@lab02:/export/home/mqm> strmqbrk -m QMGRNAME
WebSphere MQ Publish/Subscribe broker for queue manager QMGRNAME not active.
mqm@lab02:/export/home/mqm> endmqbrk -m QMGRNAME
WebSphere MQ Publish/Subscribe broker for queue manager QMGRNAME not active.
mqm@lab02:/export/home/mqm> strmqbrk -m QMGRNAME
WebSphere MQ Publish/Subscribe broker for queue manager QMGRNAME not active. |
So, I am at a lost. As soon as I failover back to the first box, everything works just fine!
Does the broker store the hostname of the server or some hardware information??
Regards,
Roger Lacroix _________________ Capitalware: Transforming tomorrow into today.
Connected to MQ!
Twitter |
|
Back to top |
|
 |
jefflowrey |
Posted: Wed Jul 20, 2005 9:15 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
No FDCs is good. No new entries to AMQERR01.LOG, either?
I think you need to enable mqtracing... _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
jefflowrey |
Posted: Wed Jul 20, 2005 9:16 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
Also, stupid question.
Does the mqm user have the same UID on both servers? _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Jul 20, 2005 12:52 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
And finally have you tried to manually do following:
On primary machine: bring everything up
a) shutdown the broker manually
b) shutdown the qmgr manually
c) failover
d) start the qmgr manually
e) start the broker manually.
IF this fails ask IBM what identifying process they use to prevent 2 brokers from sharing the qmgr at different times.
Alternatively try on a separate box:
create qmgr
start qmgr
create broker
start broker
close broker
create another (2nd) broker
start 2nd broker for same qmgr.
If this is not possible you have your answer.  |
|
Back to top |
|
 |
RogerLacroix |
Posted: Wed Jul 20, 2005 2:07 pm Post subject: |
|
|
 Jedi Knight
Joined: 15 May 2001 Posts: 3264 Location: London, ON Canada
|
fjb_saper wrote: |
On primary machine: bring everything up
a) shutdown the broker manually
b) shutdown the qmgr manually |
Yup, did that.
fjb_saper wrote: |
c) failover
d) start the qmgr manually
e) start the broker manually. |
Yup, did that and it still failed.
fjb_saper wrote: |
Alternatively try on a separate box:
create qmgr
start qmgr
create broker
start broker
close broker
create another (2nd) broker
start 2nd broker for same qmgr. |
With CSD08, there is no 'create broker' command, you just start the broker and specify the queue manager name.
Now to the cause, yes, I figured it out.
These boxes were built before my time at this client site and I have done numerous failover tests with them without any problems before. But either when they created the mqm UserID or someone altered it after the fact, they made mqm's primary group something other than the mqm group (actually, it was called: unixshar).
After going line by line through property files, ini-files, /etc/system , /etc/passwd , /etc/group etc. files I discovered it.
So, I failed everything to box #1, then on box #2, I updated mqm's primary to be the mqm group and failed it back to box #2 and voila, the brokers started properly. (of course I chage the ownership of /var/mqm and /opt/mqm to be mqm:mqm ).
Now box #2 has been setup this way for a long time, and did notice that sometime files would get the group ownership of 'unixshar' but since everything always worked, I didn't think much about it.
Hence, what is so special about the broker that it would not start (strmqbrk) on box #2 when the queue manager, command server, listener, etc. would start just fine with mqm in the wrong group?
Weird.
Regards,
Roger Lacroix
Captialware Inc. _________________ Capitalware: Transforming tomorrow into today.
Connected to MQ!
Twitter |
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Jul 20, 2005 5:28 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Thanks for sharing the result ?
 |
|
Back to top |
|
 |
jefflowrey |
Posted: Wed Jul 20, 2005 5:32 pm Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
RogerLacroix wrote: |
Hence, what is so special about the broker that it would not start (strmqbrk) on box #2 when the queue manager, command server, listener, etc. would start just fine with mqm in the wrong group? |
UID vs. GID? _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
bower5932 |
Posted: Wed Jul 20, 2005 6:16 pm Post subject: |
|
|
 Jedi Knight
Joined: 27 Aug 2001 Posts: 3023 Location: Dallas, TX, USA
|
At this point, I'd suggest going with a trace to see if you spot anything in it that might shed some light on what is actually wrong. |
|
Back to top |
|
 |
RogerLacroix |
Posted: Wed Jul 20, 2005 8:49 pm Post subject: |
|
|
 Jedi Knight
Joined: 15 May 2001 Posts: 3264 Location: London, ON Canada
|
Hi,
Quote: |
I'd suggest going with a trace to see if you spot anything in it that might shed some light on what is actually wrong. |
Two days of pulling my hair out is enough.
GID.
Box # 1: the mqm Userid (UID=104) was in the mqm group (GID=105) and the unixshar group has a GID of 107.
Box # 2: the mqm Userid (UID=104) was in the unixshar group has a GID of 103 and the mqm group had a GID of 105.
On Box # 2, I put the mqm UserID the correct roup and then set the GID for unixshar to be 107.
Regards,
Roger Lacroix
Capitalware Inc. _________________ Capitalware: Transforming tomorrow into today.
Connected to MQ!
Twitter |
|
Back to top |
|
 |
|
|
 |
Goto page 1, 2 Next |
Page 1 of 2 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|