Author |
Message
|
kris.pilaet |
Posted: Tue Jun 21, 2011 11:56 pm Post subject: Solaris cluster WMB61 EG's do nothing |
|
|
Newbie
Joined: 21 Jun 2011 Posts: 8
|
We're running WMBv61 & MQ V7 on a Clustered Solaris 10 environment.
On node B, everything runs fine. When stopping the broker & starting it, it takes its time, but everyting comes back up.
When switching to the other node (A), we're having problems.
I've found a lot of topics here that describe a part of my problem, but not completely and never with a cluster.
Here's mine. So, one node B, I'm having no problem.
When starting our broker on node A: ps -fu usernameBroker shows
* Configuration manager & Broker processes are there
* DataFlowEngines are there
** When all flows of all DataflowEngines were stopped on node B just before the switch, I'm getting
"WebSphere Broker v6108[379]: [ID 702911 user.info] (BRBA01T.eg-names)[1]BIP2208I: Execution group (64) started"
** When giving the command to start a flow of one EG, i'm getting "BIP2066E: Broker 'BRBA01T' (UUID 'a38dfaed-1a01-0000-0080-8fff11ab5618') was una
ble to retrieve an internal configuration response message for execution group"
** No matter which EG I try, they all give me the same error.
** In the mean whille, the appropriate messages pile up on SYSTEM.BROKER.EXECUTIONGROUP.QUEUE
** Every now and then they disappear. On node B, this queue has an Open-Input-count of 28 (as it should), on node B it is 0
Extra info perhaps. When stopping the broker on node B, I'm getting errors BIP2804E: The broker has detected that Execution Group X has not shut down,
which is really stupid cause it has no flows running. I'm getting this for all EG's
Normally, when we switch, Flows aren't stopped before that switch. I've done this to try giving my system more time and start things in a more controlled way.
But when doing that, I immediately get BIP2066E for each EG.
I've started/stopped everything with cluster commands. I've tried it all outside cluster... All the same
Any help, ideas, suggestions?
Our admins have no clue, we have no traces except the ones described.
We're thinking no of rebooting complete node A.
regards
kris |
|
Back to top |
|
 |
lancelotlinc |
Posted: Wed Jun 22, 2011 4:44 am Post subject: |
|
|
 Jedi Knight
Joined: 22 Mar 2010 Posts: 4941 Location: Bloomington, IL USA
|
This is an example of why active-passive is a bad idea. Its alot better to run active-active or active-active-active. _________________ http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER |
|
Back to top |
|
 |
mqjeff |
Posted: Wed Jun 22, 2011 4:47 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
lancelotlinc wrote: |
This is an example of why active-passive is a bad idea. Its alot better to run active-active or active-active-active. |
This is an example of a badly managed install, not anything else. |
|
Back to top |
|
 |
Vitor |
Posted: Wed Jun 22, 2011 4:54 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
lancelotlinc wrote: |
This is an example of why active-passive is a bad idea. Its alot better to run active-active or active-active-active. |
Active / Active!  _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
Vitor |
Posted: Wed Jun 22, 2011 4:56 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
mqjeff wrote: |
lancelotlinc wrote: |
This is an example of why active-passive is a bad idea. Its alot better to run active-active or active-active-active. |
This is an example of a badly managed install, not anything else. |
@kris.pilaet
For clarity - When you say "Clustered Solaris 10" do you mean clustered, or do you mean zoned? Are there zones anywhere in all this?
Does the install on node A work with a stand-alone queue manager & broker? _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
lancelotlinc |
Posted: Wed Jun 22, 2011 5:13 am Post subject: |
|
|
 Jedi Knight
Joined: 22 Mar 2010 Posts: 4941 Location: Bloomington, IL USA
|
This passive node probably has not been operational in quite a long time. I think if both nodes were active, and one became non-operational, the correction to the problem could have been accomplished much sooner. _________________ http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER |
|
Back to top |
|
 |
Vitor |
Posted: Wed Jun 22, 2011 5:19 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
lancelotlinc wrote: |
This passive node probably has not been operational in quite a long time. |
What leads you to that assumption?
Why is it more likely that the scenario where they're setting up an active / passive & are having trouble with the post-install testing?
There's nothing in the original post that I can see to indicate one way or the other. So what leads you to one scenario not the other? _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
lancelotlinc |
Posted: Wed Jun 22, 2011 5:20 am Post subject: |
|
|
 Jedi Knight
Joined: 22 Mar 2010 Posts: 4941 Location: Bloomington, IL USA
|
|
Back to top |
|
 |
Vitor |
Posted: Wed Jun 22, 2011 5:22 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
lancelotlinc wrote: |
I suppose we can ask the OP: how long has it been since you failed over to the other node? |
It is the logical way to find out.
My question on how you inferred one scenario not the other stands. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
lancelotlinc |
Posted: Wed Jun 22, 2011 5:35 am Post subject: |
|
|
 Jedi Knight
Joined: 22 Mar 2010 Posts: 4941 Location: Bloomington, IL USA
|
I know you do not like assumptions, so I guess I can be guilty of making that one.
I inferred the assumption based on the fact that he is on a 7 year old OS using WMB 6.1, so I could not imagine it was a new configuration. _________________ http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER |
|
Back to top |
|
 |
Vitor |
Posted: Wed Jun 22, 2011 5:49 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
lancelotlinc wrote: |
I inferred the assumption based on the fact that he is on a 7 year old OS using WMB 6.1, so I could not imagine it was a new configuration. |
I suppose Solaris 10 did come out 7 years ago but isn't it still in support? When did Solaris 11 come out?
Bear in mind there are still people installing WMB6.1 despite WMB7 being out. While your site has an agressive update policy, others do not. Especially if they have a large existing estate and set up new machines off a standard install.
I will agree that somewhere in a dark corner of the OP's site, someone should be working on a new install with WMBv7, WMQv7 & Solaris 11. But they may not be, and that's not going to solve the OP's problem.
Which could also be happening on an active / passive that's not been properly maintained and/or tested as per your assumption. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
kris.pilaet |
Posted: Wed Jun 22, 2011 6:22 am Post subject: |
|
|
Newbie
Joined: 21 Jun 2011 Posts: 8
|
Vitor wrote: |
mqjeff wrote: |
lancelotlinc wrote: |
This is an example of why active-passive is a bad idea. Its alot better to run active-active or active-active-active. |
This is an example of a badly managed install, not anything else. |
@kris.pilaet
For clarity - When you say "Clustered Solaris 10" do you mean clustered, or do you mean zoned? Are there zones anywhere in all this?
Does the install on node A work with a stand-alone queue manager & broker? |
It is zoned and clustered. Our MQ & Broker zone runs on a cluster.
Queue manager data is shared between both nodes.
We have one queue manager per zone that is contacted by node A or B.
When switching, the only thing that changes is the hardware, and the OS (which is the same OS on A & B) |
|
Back to top |
|
 |
kris.pilaet |
Posted: Wed Jun 22, 2011 6:25 am Post subject: |
|
|
Newbie
Joined: 21 Jun 2011 Posts: 8
|
We're having the problem since end of may.
We are actually migrating to version 7 broker, but I'm having a lot of applications running with all their deadlines and SLA's.
Migration is a work that considers a lot of, how to say, struggling at my firm.
About migration to V7-broker: Fixpack 2 isn't out that long. So why hurry??
Is v6.1 suddenly something prehistoric? |
|
Back to top |
|
 |
lancelotlinc |
Posted: Wed Jun 22, 2011 6:30 am Post subject: |
|
|
 Jedi Knight
Joined: 22 Mar 2010 Posts: 4941 Location: Bloomington, IL USA
|
Your passive node has been non-operational for three and a half weeks. The point I was making was, if you were simple active-active, no clustering or zones, you could have resolved the problem within minutes or hours rather than weeks.
WMB6.1 is fine, especially if you have lots of apps that need TLC to migrate. _________________ http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER |
|
Back to top |
|
 |
kris.pilaet |
Posted: Wed Jun 22, 2011 6:39 am Post subject: |
|
|
Newbie
Joined: 21 Jun 2011 Posts: 8
|
lancelotlinc wrote: |
Your passive node has been non-operational for three and a half weeks. The point I was making was, if you were simple active-active, no clustering or zones, you could have resolved the problem within minutes or hours rather than weeks.
WMB6.1 is fine, especially if you have lots of apps that need TLC to migrate. |
okay lancelotlinc, i'm not going to argue that . But I think the solution remains the same, not? The thing is, what is the solution?  |
|
Back to top |
|
 |
|