Author |
Message
|
Challenger |
Posted: Sun Apr 06, 2008 2:03 pm Post subject: Challenge Question - 04 / 2008 - Week Two |
|
|
 Centurion
Joined: 31 Mar 2008 Posts: 115
|
Answers to questions from Week One of the April 08 Challenge are summarized below. You may ask questions of the Challenger and you may ask the Challenger to run commands to give you more information. You may ask two questions per post. After your two questions are answered, you are free to post another two questions.
You may also ask questions about steps we took at the request of IBM tech support. For example: "Did L3 ask you stand on your head and what was the result?" "Why, yes, and it made the Challenger very dizzy."
Don't be discouraged! If we don't make progress in Week Two, the Challenger may take pity upon the forum and post a clue.
Original presenting problem:
After upgrade from MQ v5.3 CSD13 to v6.0.2.2, nine queue managers on one server started fine as 'mqm'. Later in the day, a tenth queue manager on the same server would not, and had to be started by a different user ID. OS=Solaris 9
Subsequent problem:
Troubleshooting revealed that two kernel updates had been missed during upgrade. These were corrected and the server rebooted (bringing all kernel parameters to IBM's specifications). After reboot, all ten queue managers were started immediately and all ten failed to start as 'mqm'.
Information and answers from Week One:
When the qmgr fails to start, we are instantly returned to a command prompt.
No stdout.
No FDCs.
No entries to queue manager AMQERRORx.LOG.
Migration steps were as specified in Quick Beginnings. Upgrade checklist had been used successfully on three previous servers of the same configuration. The Solaris utility "pkg" was used.
'setuid' option was properly accepted during installation.
With OAM disabled, qmgr still will not start.
Resource limits for 'mqm' are:
time(seconds) unlimited
file(blocks) unlimited
data(kbytes) unlimited
stack(kbytes) 8192
coredump(blocks) unlimited
nofiles(descriptors) 10000
vmemory(kbytes) unlimited
'mqm' is as NIS ID, and is a member of the mqm group.
'mqm' home directory is an NFS mount.
'mqm' is used successfully on other servers in the NIS domain.
'mqm' user and group have appropriate access to /opt/mqm and /var/mqm/.
Space on /var/mqm is more than sufficient for successful operation.
Solaris is properly running in 64-bit mode.
LIBPATH and LD_LIBRARY_PATH variables are not set for 'mqm'.
L3 analyzed a trace of strmqm, and found a SIGBUS error, However, the Challenger feels the analysis was superficial, and the trace analysis by L3 is not relevant to solving the problem.
Queue managers can be created and deleted as 'mqm'.
Environment: excerpts from 'mqm' environment.
HZ=
LC_COLLATE=en_US.ISO8859-1
LC_CTYPE=en_US.ISO8859-1
LC_MESSAGES=C
LC_MONETARY=en_US.ISO8859-1
LC_NUMERIC=en_US.ISO8859-1
PATH=.:/opt/sfw/bin:/opt/mqm/bin:/opt/mqm/samp/bin:/usr/bin:
SHELL=/bin/sh
Environment: excerpts from user that works.
_=/usr/bin/env
LC_MONETARY=en_US.ISO8859-1
PATH=/usr/bin:/usr/sbin:/opt/sfw/bin:/var/mqm:/var/wmqi:/opt/mqm/bin:/var/mqm/utilities:/opt/mqm/samp/bin:/usr/bin
LC_MESSAGES=C
LC_CTYPE=en_US.ISO8859-1
SHELL=/bin/ksh
HOME=/home/challenger
LC_COLLATE=en_US.ISO8859-1
LC_NUMERIC=en_US.ISO8859-1 |
|
Back to top |
|
 |
gbaddeley |
Posted: Sun Apr 06, 2008 4:22 pm Post subject: |
|
|
 Jedi Knight
Joined: 25 Mar 2003 Posts: 2538 Location: Melbourne, Australia
|
Its unusual that mqm directories are in the PATH, even for the mqm user. Normally on Solaris there are sym links from /usr/bin to /opt/mqm/bin for the main MQ commands. Both users should be able to operate MQ with very simple PATHs, eg. PATH=/usr/bin:/usr/sbin:/opt/sfw/bin:. and maybe with /var/mqm/utilities:/opt/mqm/samp/bin thrown on the end for good measure.
1. Are the links from /usr/bin set up correctly to /opt/mqm/bin ? (output of ls -l /opt/mqm | grep mq)
2. What difference does it make if mqm's PATH is changed to the user that works? |
|
Back to top |
|
 |
Gaya3 |
Posted: Sun Apr 06, 2008 7:50 pm Post subject: |
|
|
 Jedi
Joined: 12 Sep 2006 Posts: 2493 Location: Boston, US
|
Please do the following activities
1. alias strmqm="env LIBPATH=/usr/mqm/lib64:$LIBPATH strmqm"
2. crtmqm sample
3. strmqm sample
and let me know the status/results of the same.
Regards
Gayathri _________________ Regards
Gayathri
-----------------------------------------------
Do Something Before you Die |
|
Back to top |
|
 |
gbaddeley |
Posted: Mon Apr 07, 2008 5:58 pm Post subject: Re: Challenge Question - 04 / 2008 - Week Two |
|
|
 Jedi Knight
Joined: 25 Mar 2003 Posts: 2538 Location: Melbourne, Australia
|
[quote="Challenger"][color=green][b]Original presenting problem:[/b][/color]
After upgrade from MQ v5.3 CSD13 to v6.0.2.2, nine queue managers on one server started fine as 'mqm'. Later in the day, a tenth queue manager on the same server would not, and had to be started by a different user ID. OS=Solaris 9
[color=green][b]Subsequent problem:[/b][/color]
Troubleshooting revealed that two kernel updates had been missed during upgrade. These were corrected and the server rebooted (bringing all kernel parameters to IBM's specifications). After reboot, all ten queue managers were started immediately and all ten failed to start as 'mqm'.
[/quote]
There are some clues here that 9 qmgrs were running ok, then later in the day it was not possible to start a 10th. After a reboot, none of them could be started. This suggests that something was changed during the first day, all we have to determine what it was.
1. Did the system administrator or MQ administrator make any changes to the system configuration before the 10th qmgr failed to start?
2. What was it? _________________ Glenn |
|
Back to top |
|
 |
Challenger |
Posted: Mon Apr 07, 2008 8:01 pm Post subject: |
|
|
 Centurion
Joined: 31 Mar 2008 Posts: 115
|
I apologize for the delay in responding, and plan to reply tomorrow. Sometimes work gets in the way of the fun stuff.
P.S. The questions are getting more germane as we progress. Good work. |
|
Back to top |
|
 |
Challenger |
Posted: Tue Apr 08, 2008 1:48 pm Post subject: |
|
|
 Centurion
Joined: 31 Mar 2008 Posts: 115
|
gbaddeley wrote: |
1. Are the links from /usr/bin set up correctly to /opt/mqm/bin ? (output of ls -l /opt/mqm | grep mq)
2. What difference does it make if mqm's PATH is changed to the user that works? |
Is 'ls -l /opt/mqm |grep mq' what you want to verify the link? Or is this what you want to see:
mqm:/home/mqm>ls -l /usr/bin | grep strmqm
lrwxrwxrwx 1 root other 19 Mar 1 09:56 strmqm -> /opt/mqm/bin/strmqm
If the path of 'mqm' is changed to be identical the other user's path, it works ! The 'mqm' user can start all ten queue managers.
So now you have a workaround. But why did it work, and what is the root cause? Does something still need to be fixed? |
|
Back to top |
|
 |
Challenger |
Posted: Tue Apr 08, 2008 1:59 pm Post subject: |
|
|
 Centurion
Joined: 31 Mar 2008 Posts: 115
|
Gaya3 wrote: |
Please do the following activities
1. alias strmqm="env LIBPATH=/usr/mqm/lib64:$LIBPATH strmqm"
2. crtmqm sample
3. strmqm sample
and let me know the status/results of the same.
Regards
Gayathri |
mqm:/home/mqm>alias strmqm="env LIBPATH=/usr/mqm/lib64:$LIBPATH strmqm"
mqm:/home/mqm>crtmqm sample
WebSphere MQ queue manager created.
Creating or replacing default objects for sample.
Default objects statistics : 40 created. 0 replaced. 0 failed.
Completing setup.
Setup completed.
mqm:/home/mqm>strmqm sample
mqm:/home/mqm>
New queue manager, 'sample', will not start as 'mqm'. |
|
Back to top |
|
 |
Challenger |
Posted: Tue Apr 08, 2008 2:05 pm Post subject: Re: Challenge Question - 04 / 2008 - Week Two |
|
|
 Centurion
Joined: 31 Mar 2008 Posts: 115
|
gbaddeley wrote: |
There are some clues here that 9 qmgrs were running ok, then later in the day it was not possible to start a 10th. After a reboot, none of them could be started. This suggests that something was changed during the first day, all we have to determine what it was.
1. Did the system administrator or MQ administrator make any changes to the system configuration before the 10th qmgr failed to start?
2. What was it? |
There were no changes made to the system configuration. There were no changes made to any MQ configurations. In fact, I assure you that a comparison of all OS and MQ files 'before & after' would show them to be identical.
But you are correct, something was different. You are so close to the root cause..... |
|
Back to top |
|
 |
Gaya3 |
Posted: Tue Apr 08, 2008 8:55 pm Post subject: |
|
|
 Jedi
Joined: 12 Sep 2006 Posts: 2493 Location: Boston, US
|
Answer these questions to me
1 NIS domain, did you create the IDs on the NIS master server machine.(Both user ID and group ID must be set to mqm)
2. Hope the NFS mount is working properly well with out having any network failures. (setuid,and having root access too)
These are two places that i can see the issues could be
Regards
Gayathri _________________ Regards
Gayathri
-----------------------------------------------
Do Something Before you Die |
|
Back to top |
|
 |
gbaddeley |
Posted: Tue Apr 08, 2008 11:21 pm Post subject: |
|
|
 Jedi Knight
Joined: 25 Mar 2003 Posts: 2538 Location: Melbourne, Australia
|
Challenger wrote: |
gbaddeley wrote: |
1. Are the links from /usr/bin set up correctly to /opt/mqm/bin ? (output of ls -l /opt/mqm | grep mq)
2. What difference does it make if mqm's PATH is changed to the user that works? |
Is 'ls -l /opt/mqm |grep mq' what you want to verify the link? Or is this what you want to see:
mqm:/home/mqm>ls -l /usr/bin | grep strmqm
lrwxrwxrwx 1 root other 19 Mar 1 09:56 strmqm -> /opt/mqm/bin/strmqm
If the path of 'mqm' is changed to be identical the other user's path, it works ! The 'mqm' user can start all ten queue managers.
So now you have a workaround. But why did it work, and what is the root cause? Does something still need to be fixed? |
Sorry, I did actually mean ls -l /usr/bin, not /opt/mqm. Thanks, you confirmed that the sym link is ok for strmqm.
Environment excerpts from 'mqm'
PATH=.:/opt/sfw/bin:/opt/mqm/bin:/opt/mqm/samp/bin:/usr/bin:
Environment excerpts from user that works
PATH=/usr/bin:/usr/sbin:/opt/sfw/bin:/var/mqm:/var/wmqi:/opt/mqm/bin:/var/mqm/utilities:/opt/mqm/samp/bin:/usr/bin
When the strmqm command is entered, the shell searches the directories in the PATH from left to right. For the user that works, /usr/bin is first (and this is generally how it should be), and the symlink in here for strmqm points to /opt/mqm/bin/strmqm, where the real binary executable file is.
mqm's PATH is rather crazy. It contains "." (the current directory) first, so if there is a file in the current directory with the name strmqm the shell will try to run it.
Q1: Is there a strmqm file there?
/opt/mqm/bin should not be in the PATH, as the correct way to get to MQ commands in here is via the symlinks in /usr/bin.
Its ok to have /opt/mqm/samp/bin in the PATH, as this saves some typing whenever the MQ sample program need to be run (amqsput, amqsget, amqsbcg etc).
/usr/bin is way up the end of the PATH. It should really be up the front.
It can be argued that "." should not be in the PATH because it is a security & integrity risk. If someone slips a script or binary into your directory which has the same name as a command elsewhere, they can high-jack what you are doing. _________________ Glenn |
|
Back to top |
|
 |
Challenger |
Posted: Wed Apr 09, 2008 7:26 am Post subject: |
|
|
 Centurion
Joined: 31 Mar 2008 Posts: 115
|
Gaya3 wrote: |
1 NIS domain, did you create the IDs on the NIS master server machine.(Both user ID and group ID must be set to mqm)
2. Hope the NFS mount is working properly well with out having any network failures. (setuid,and having root access too) |
Yes, 'mqm' ID was created on the master NIS server and is a member of the mqm group. (In a previous post, I believe I posted the relevant lines from the NIS password and group file.)
No NFS errors or issues. The NIS user is successful on other servers. |
|
Back to top |
|
 |
Challenger |
Posted: Wed Apr 09, 2008 7:54 am Post subject: |
|
|
 Centurion
Joined: 31 Mar 2008 Posts: 115
|
We have a winner!
gbaddeley wrote: |
Environment excerpts from 'mqm'
PATH=.:/opt/sfw/bin:/opt/mqm/bin:/opt/mqm/samp/bin:/usr/bin:
Environment excerpts from user that works
PATH=/usr/bin:/usr/sbin:/opt/sfw/bin:/var/mqm:/var/wmqi:/opt/mqm/bin:/var/mqm/utilities:/opt/mqm/samp/bin:/usr/bin
When the strmqm command is entered, the shell searches the directories in the PATH from left to right. For the user that works, /usr/bin is first (and this is generally how it should be), and the symlink in here for strmqm points to /opt/mqm/bin/strmqm, where the real binary executable file is.
mqm's PATH is rather crazy. It contains "." (the current directory) first, so if there is a file in the current directory with the name strmqm the shell will try to run it.
Q1: Is there a strmqm file there?
/opt/mqm/bin should not be in the PATH, as the correct way to get to MQ commands in here is via the symlinks in /usr/bin.
Its ok to have /opt/mqm/samp/bin in the PATH, as this saves some typing whenever the MQ sample program need to be run (amqsput, amqsget, amqsbcg etc).
/usr/bin is way up the end of the PATH. It should really be up the front.
It can be argued that "." should not be in the PATH because it is a security & integrity risk. If someone slips a script or binary into your directory which has the same name as a command elsewhere, they can high-jack what you are doing. |
You are correct. There is a v5.3 copy of strmqm in /home/mqm. With "." in the path, the incorrect version was being executed when the execution occurred from that directory.
I inherited this system some months ago, and I must have looked at that copy of strmqm in /home/mqm a million times -- but never thought about that copy of strmqm.
On the day of the upgrade, I apparently was not in /home/mqm when the first nine queue managers were started, so the proper copy of strmqm was read.
The problem was identified when L3 asked me to do a truss, which showed that v5.3 code was executing. Then they asked for "which strmqm" as both users. It was then that it became clear that strmqm was executing from "./strmqm".
Good work, gbaddeley. PM your shipping information to me, and a valuable mqseries.net coffee mug will be on its way to you! |
|
Back to top |
|
 |
Challenger |
Posted: Wed Apr 09, 2008 8:18 am Post subject: |
|
|
 Centurion
Joined: 31 Mar 2008 Posts: 115
|
Here is one more Challenge for you! Just for fun....
Can you guess my true identity from reading all that I've written? I post regularly to the forum, but I am not of jedi status.
If you can guess who I am, you'll receive a trinket from the Challenger's home town.
Who am I? |
|
Back to top |
|
 |
jefflowrey |
Posted: Wed Apr 09, 2008 8:18 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
Challenger wrote: |
There is a v5.3 copy of strmqm in /home/mqm. |
Oh, man...
Great job everyone, great challenge!  _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
bbburson |
Posted: Wed Apr 09, 2008 9:37 am Post subject: |
|
|
Partisan
Joined: 06 Jan 2004 Posts: 378 Location: Nowhere near a queue manager
|
Challenger wrote: |
Here is one more Challenge for you! Just for fun....
Can you guess my true identity from reading all that I've written? I post regularly to the forum, but I am not of jedi status.
If you can guess who I am, you'll receive a trinket from the Challenger's home town.
Who am I? |
My guess (and it is PURELY a guess) is Toronto_MQ. |
|
Back to top |
|
 |
|