ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » IBM MQ Installation/Configuration Support » MQ Pub/Sub problem

Post new topic  Reply to topic Goto page 1, 2  Next
 MQ Pub/Sub problem « View previous topic :: View next topic » 
Author Message
RogerLacroix
PostPosted: Tue Jul 19, 2005 6:18 pm    Post subject: MQ Pub/Sub problem Reply with quote

Jedi Knight

Joined: 15 May 2001
Posts: 3264
Location: London, ON Canada

All,

At a client site they are starting to use MQ Pub/Sub.

Environment:
- Solaris 8
- WMQ v5.3 CSD08 (note: CSD08 includes Pub/Sub libraries for those who don't know)

I setup 2 queue managers for 2 applications on a DEV server (single server). Everything was going alone fine (except for the usual app dev problems), so we decided to test it on the LAB MQ server. The LAB MQ server is setup with Veritas with 2 Solaris servers in an Active / Passive design.

So, I created the queue managers on the LAB server with hacrtmqm, setup the pub/sub queues and then configured the failover support in Veritas just like I have done many, many times before. The only new item added was an entry under applications for each broker. I enable the queue managers and started them and everything was fine.

The application connected and did some testing. Next I failed everything over to the passive box. Everything started including the new queue managers except for the brokers. After a minute, Veritas reports that the 2 brokers have faulted.

I can connect to all queue managers including the new ones and get/put messages. From the shell, I typed 'dspmqbrk -m QMgrName' and it said it is not active. So, I figured I would manually start the broker. When I try to start the broker it said 'not active'. I swear, I'm not kidding. So, I typed it really slowing a second time and it again said 'not active'. (What I said next, I will not repeat here.)

Here are the commands & output from the shell:

Code:
mqm@xxxxxxxxx:/export/home/mqm> dspmqbrk -m QMGRNAME
WebSphere MQ Publish/Subscribe broker for queue manager QMGRNAME not active.

mqm@xxxxxxxxx:/export/home/mqm> strmqbrk -m QMGRNAME
WebSphere MQ Publish/Subscribe broker for queue manager QMGRNAME not active.


Now for the kicker, I cleared the 2 faults in Veritas and failed it back to the first server and EVERYTHING started including the 2 brokers!!!! So, I failed it again to the second server and everything started, queue mangers, listeners, commands servers for each queue manager, but the 2 broker would not start!!! I cleared the 2 faults in Veritas and failed it back to the first server and EVERYTHING started including the 2 brokers!!

What in the world is up? The 2 servers are exactly the same, same OS, same kernel patch level and same WMQ 5.3 with the same CSD08.

Anybody got ANY ideas what is going on?

Regards,
Roger Lacroix
Capitalware Inc.
_________________
Capitalware: Transforming tomorrow into today.
Connected to MQ!
Twitter
Back to top
View user's profile Send private message Visit poster's website
fjb_saper
PostPosted: Tue Jul 19, 2005 9:38 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

Could it be that the primary broker has some system (broker) queues opened in exclusive mode ?

As long as the broker on the first failed over machine is not shut down there is possibly little chance of starting the broker on the failover machine.

Remember even though we are talking about 2 different qmgrs they still share a number of things the queue file system not being the minor part.

I suppose that on both boxes the broker user was in the mqm group.

Enjoy
Back to top
View user's profile Send private message Send e-mail
RogerLacroix
PostPosted: Wed Jul 20, 2005 8:10 am    Post subject: Reply with quote

Jedi Knight

Joined: 15 May 2001
Posts: 3264
Location: London, ON Canada

Hi,

You are confusing MQ clustering with hardware clustering.

Just to clarify a few points:

- When I said clustering, I meant hardware clustering and not MQ clustering
- When I said broker, I did not mean MQSI stuff. CSD08 now includes the old SupportPac MA0C - hence I meant that broker.


I even deleted the broker on the first server, failed it over and deleted the broker on the second server and tried to start the broker. No luck, same error message.

There are 6 queue managers on box #1 setup as a hardware cluster. During a failover Veritas unmounts and then mounts the following disk partitions between box #1 and box #2

/var/mqm/data
/var/mqm/log
/var/mqm/errors

Yes, there are 6 queue managers on box #1 but during a failover, the 6 QMs are stopped (including the brokers) then the EMC disks are mounted on the other server (box #2) then everything is started.

Therefore, it IS the same queue managers / brokers that are being stopped and started.


Regards,
Roger Lacroxi
Capitalware Inc.
_________________
Capitalware: Transforming tomorrow into today.
Connected to MQ!
Twitter
Back to top
View user's profile Send private message Visit poster's website
jefflowrey
PostPosted: Wed Jul 20, 2005 8:13 am    Post subject: Reply with quote

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

RogerLacroix wrote:
You are confusing MQ clustering with hardware clustering.


I don't think he was.
_________________
I am *not* the model of the modern major general.
Back to top
View user's profile Send private message
RogerLacroix
PostPosted: Wed Jul 20, 2005 8:27 am    Post subject: Reply with quote

Jedi Knight

Joined: 15 May 2001
Posts: 3264
Location: London, ON Canada

Hi Jeff,

I believe he was:
Quote:
Remember even though we are talking about 2 different qmgrs

There is only ONE queue manager. Actually, the box has 6 queue managers. QM1, QM2, QM3, QM4, QM5 & QM6. QM5 and QM6 have Pub/Sub enabled and each have their own broker running.

During a failover, Veritas stops things in the following order:

- brokers
- monitoring tool
- Command servers,
- listeners
- queue managers
- unmounts 3 disks
- removes the 3 volumes
- removes the VIP (virtual IP)

Once completed then on the other box it does:
- adds the VIP
- adds the 3 volumes
- mounts the 3 disks
- starts QM1, QM2, QM3, QM4, QM5 & QM6
- starts all command servers
- starts all listeners
- starts monitor tool
- starts brokers

Quote:
I suppose that on both boxes the broker user was in the mqm group.

It is the broker that is included with MQ v5.3 CSD08.
i.e.
/opt/mqm/bin/strmqbrk
/opt/mqm/bin/dspmqbrk
/opt/mqm/bin/endmqbrk

And yes, everything is started & stopped using the mqm account and it IS in the mqm group on both boxes.

Regards,
Roger Lacroix
Capitalware Inc.
_________________
Capitalware: Transforming tomorrow into today.
Connected to MQ!
Twitter
Back to top
View user's profile Send private message Visit poster's website
jefflowrey
PostPosted: Wed Jul 20, 2005 9:00 am    Post subject: Reply with quote

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

I think fjb_saper was suggesting that the broker process may not have completely shut down, and still had locks open on the qmgr files - and that therefore when the "same" or "other" (depending on perspective, really) qmgr tried to start up, it couldn't acquire the right access to "it's own" files.

This doesn't seem likely given your procedure, though.

So, umm, did you check for errors when the second broker reported it couldn't start up?
_________________
I am *not* the model of the modern major general.
Back to top
View user's profile Send private message
RogerLacroix
PostPosted: Wed Jul 20, 2005 9:13 am    Post subject: Reply with quote

Jedi Knight

Joined: 15 May 2001
Posts: 3264
Location: London, ON Canada

Hi,

Veritas is really good at keeping track of items when it is starting or shutdown programs. It would not go to the next step until the current step is down. Otherwise it throws a fault and stops.

There are no FDC in either /var/mqm/errors/ or /var/mqm/qmgrs/QMGRNAME/errors/

When I issue the strmqbrk command manually on the second box it always gives the following error:
Code:
WebSphere MQ Publish/Subscribe broker for queue manager QMGRNAME not active.

Of course it is not active, that is why I am trying to start it!!! Stupid messages!

Here are the commands I just did manually on the second box:
Code:
mqm@lab02:/export/home/mqm> strmqbrk -m QMGRNAME
WebSphere MQ Publish/Subscribe broker for queue manager QMGRNAME not active.
mqm@lab02:/export/home/mqm> dltmqbrk -m QMGRNAME
WebSphere MQ Publish/Subscribe broker (QMGRNAME) deleted.
mqm@lab02:/export/home/mqm> strmqbrk -m QMGRNAME
WebSphere MQ Publish/Subscribe broker for queue manager QMGRNAME not active.
mqm@lab02:/export/home/mqm> endmqbrk -m QMGRNAME
WebSphere MQ Publish/Subscribe broker for queue manager QMGRNAME not active.
mqm@lab02:/export/home/mqm> strmqbrk -m QMGRNAME
WebSphere MQ Publish/Subscribe broker for queue manager QMGRNAME not active.

So, I am at a lost. As soon as I failover back to the first box, everything works just fine!

Does the broker store the hostname of the server or some hardware information??

Regards,
Roger Lacroix
_________________
Capitalware: Transforming tomorrow into today.
Connected to MQ!
Twitter
Back to top
View user's profile Send private message Visit poster's website
jefflowrey
PostPosted: Wed Jul 20, 2005 9:15 am    Post subject: Reply with quote

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

No FDCs is good. No new entries to AMQERR01.LOG, either?

I think you need to enable mqtracing...
_________________
I am *not* the model of the modern major general.
Back to top
View user's profile Send private message
jefflowrey
PostPosted: Wed Jul 20, 2005 9:16 am    Post subject: Reply with quote

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

Also, stupid question.

Does the mqm user have the same UID on both servers?
_________________
I am *not* the model of the modern major general.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Wed Jul 20, 2005 12:52 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

And finally have you tried to manually do following:

On primary machine: bring everything up

a) shutdown the broker manually
b) shutdown the qmgr manually

c) failover
d) start the qmgr manually
e) start the broker manually.

IF this fails ask IBM what identifying process they use to prevent 2 brokers from sharing the qmgr at different times.

Alternatively try on a separate box:
create qmgr
start qmgr
create broker
start broker
close broker
create another (2nd) broker
start 2nd broker for same qmgr.

If this is not possible you have your answer.
Back to top
View user's profile Send private message Send e-mail
RogerLacroix
PostPosted: Wed Jul 20, 2005 2:07 pm    Post subject: Reply with quote

Jedi Knight

Joined: 15 May 2001
Posts: 3264
Location: London, ON Canada

fjb_saper wrote:
On primary machine: bring everything up

a) shutdown the broker manually
b) shutdown the qmgr manually

Yup, did that.

fjb_saper wrote:
c) failover
d) start the qmgr manually
e) start the broker manually.

Yup, did that and it still failed.

fjb_saper wrote:
Alternatively try on a separate box:
create qmgr
start qmgr
create broker
start broker
close broker
create another (2nd) broker
start 2nd broker for same qmgr.

With CSD08, there is no 'create broker' command, you just start the broker and specify the queue manager name.

Now to the cause, yes, I figured it out.

These boxes were built before my time at this client site and I have done numerous failover tests with them without any problems before. But either when they created the mqm UserID or someone altered it after the fact, they made mqm's primary group something other than the mqm group (actually, it was called: unixshar).

After going line by line through property files, ini-files, /etc/system , /etc/passwd , /etc/group etc. files I discovered it.

So, I failed everything to box #1, then on box #2, I updated mqm's primary to be the mqm group and failed it back to box #2 and voila, the brokers started properly. (of course I chage the ownership of /var/mqm and /opt/mqm to be mqm:mqm ).

Now box #2 has been setup this way for a long time, and did notice that sometime files would get the group ownership of 'unixshar' but since everything always worked, I didn't think much about it.

Hence, what is so special about the broker that it would not start (strmqbrk) on box #2 when the queue manager, command server, listener, etc. would start just fine with mqm in the wrong group?

Weird.

Regards,
Roger Lacroix
Captialware Inc.
_________________
Capitalware: Transforming tomorrow into today.
Connected to MQ!
Twitter
Back to top
View user's profile Send private message Visit poster's website
fjb_saper
PostPosted: Wed Jul 20, 2005 5:28 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

Thanks for sharing the result ?

Back to top
View user's profile Send private message Send e-mail
jefflowrey
PostPosted: Wed Jul 20, 2005 5:32 pm    Post subject: Reply with quote

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

RogerLacroix wrote:
Hence, what is so special about the broker that it would not start (strmqbrk) on box #2 when the queue manager, command server, listener, etc. would start just fine with mqm in the wrong group?


UID vs. GID?
_________________
I am *not* the model of the modern major general.
Back to top
View user's profile Send private message
bower5932
PostPosted: Wed Jul 20, 2005 6:16 pm    Post subject: Reply with quote

Jedi Knight

Joined: 27 Aug 2001
Posts: 3023
Location: Dallas, TX, USA

At this point, I'd suggest going with a trace to see if you spot anything in it that might shed some light on what is actually wrong.
Back to top
View user's profile Send private message Send e-mail Visit poster's website AIM Address Yahoo Messenger
RogerLacroix
PostPosted: Wed Jul 20, 2005 8:49 pm    Post subject: Reply with quote

Jedi Knight

Joined: 15 May 2001
Posts: 3264
Location: London, ON Canada

Hi,
Quote:
I'd suggest going with a trace to see if you spot anything in it that might shed some light on what is actually wrong.

Two days of pulling my hair out is enough.

Quote:
UID vs. GID?

GID.

Box # 1: the mqm Userid (UID=104) was in the mqm group (GID=105) and the unixshar group has a GID of 107.

Box # 2: the mqm Userid (UID=104) was in the unixshar group has a GID of 103 and the mqm group had a GID of 105.

On Box # 2, I put the mqm UserID the correct roup and then set the GID for unixshar to be 107.

Regards,
Roger Lacroix
Capitalware Inc.
_________________
Capitalware: Transforming tomorrow into today.
Connected to MQ!
Twitter
Back to top
View user's profile Send private message Visit poster's website
Display posts from previous:   
Post new topic  Reply to topic Goto page 1, 2  Next Page 1 of 2

MQSeries.net Forum Index » IBM MQ Installation/Configuration Support » MQ Pub/Sub problem
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.