MQSeries.net :: View topic - QMGR directory path being tuncated when recreating qmgr

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » General Discussion » QMGR directory path being tuncated when recreating qmgr

Goto page 1, 2 Next

QMGR directory path being tuncated when recreating qmgr

« View previous topic :: View next topic »

Author

Message

Boomn4x4

Posted: Tue Jun 03, 2014 6:18 am Post subject: QMGR directory path being tuncated when recreating qmgr

Disciple

Joined: 28 Nov 2011
Posts: 172

I have an environment variable that stores the qmgr name

Code:

export QMGR=PROD_QMGR1234_QM

I have a script that creates a the qmgr

Code:

crtmqm -c 'Queue Manager $QMGR' -q -u DLQ -ll $QMGR

The script creates the qmgr just fine

Code:

dspmq
QMNAME(PROD_QMGR1234_QM) STATUS(Running)

However...
The directory path that is created for the qmgr is wrong, it is dropping off the numbers

Code:

pwd
/[b]var/mqm/qmgrs/PROD_QMGR[/b]

It is also showing this in mqs.ini

Code:

DefaultPrefix=/var/mqm

LogDefaults:
LogDefaultPath=/var/mqm/log
QueueManager:
Name=PROD_QMGR1234_QM
Prefix=/var/mqm
[b]Directory=PROD_QMGR[/b]
DefaultQueueManager:
Name=PROD_QMGR1234_QM

This seems to only happen if recreating a qmgr, as in, the qmgr existed at one point, was deleted, then recreated.

Boomn4x4

Posted: Tue Jun 03, 2014 6:22 am Post subject:

Disciple

Joined: 28 Nov 2011
Posts: 172

Figured it out.... dltmqm doesn't remove anything from /var/mqm/sockets

Have to delete that directory manually.

fjb_saper

Posted: Wed Jun 04, 2014 6:38 pm Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20696
Location: LI,NY

Boomn4x4 wrote:

Figured it out.... dltmqm doesn't remove anything from /var/mqm/sockets

Have to delete that directory manually.

Don't do that...
use amqiclen and remove all shared memory for the qmgr in question.
Deleting there will be a problem if you have multiple qmgrs...

_________________
MQ & Broker admin

Boomn4x4

Posted: Thu Jun 05, 2014 4:37 am Post subject:

Disciple

Joined: 28 Nov 2011
Posts: 172

fjb_saper wrote:

Boomn4x4 wrote:

Figured it out.... dltmqm doesn't remove anything from /var/mqm/sockets

Have to delete that directory manually.

Don't do that...
use amqiclen and remove all shared memory for the qmgr in question.
Deleting there will be a problem if you have multiple qmgrs...

Don't do that...

Per
http://www-01.ibm.com/support/docview.wss?uid=swg21414283

Quote:

Technote (FAQ)

Question

I have heard some people describe how they fixed a WebSphere MQ problem using "ipcrm" or "amqiclen". But is it safe to do this? What are the dangers?
Cause

It is not advisable to use these tools. See details below.
Answer

amqiclen

The amqiclen tool was developed to run as part of the WebSphere MQ installer. Its purpose is to check that it is safe to proceed with the install. The checks it makes are whether MQ's shared memory segments are in use by any running processes.

The amqiclen tool is potentially just as dangerous to an MQ system as the ipcrm command.

Therefore, for the same reason WebSphere MQ Support advises not to run ipcrm, we advise also not running amqiclen.

In fact amqiclen was not intended for use by MQ users or administrators directly. When used directly its effects are not supported by WebSphere MQ Support unless they have specifically advised it while working through a PMR.

Instead of running amqiclen to attempt to solve a perceived problem, please instead raise the matter with IBM in a PMR, so that WebSphere MQ Support can gather diagnostics and work through the issue to resolution.

Conclusion

There should never be a need for an administrator to manually remove MQ's shared memory or semaphore sets from the OS.

MQ has been written to tidy up after itself and so manage these things itself.

Therefore ipcrm and amqiclen should not need to be included in normal administration activities.

If you ever think MQ has not managed these resources successfully, they should raise the matter with IBM in a PMR, so that WebSphere MQ Support can gather diagnostics and work through the issue to resolution.

gbaddeley

Posted: Thu Jun 05, 2014 3:23 pm Post subject:

Jedi

Joined: 25 Mar 2003
Posts: 2494
Location: Melbourne, Australia

Interesting. Sometimes a qmgr may refuse to start due to an ipc resource issue. Run amqiclen against the qmgr. The qmgr starts. It is a valuable tool if you know when and how to apply it.
_________________
Glenn

Andyh

Posted: Thu Jun 05, 2014 9:52 pm Post subject:

Master

Joined: 29 Jul 2010
Posts: 237

As stated previously, if running amqiclen in this manner (without the -F flag) enables a QMgr to restart, where it would previously have failed to restart then IBM would accept an APAR. Using amqiclen with the -F flag is not safe, is not supported, and should only be used on the direct advice of MQ support (preferably either L3 or development).

strmqm includes the same logic as amqiclen (without the -F option) in regard to cleaning up IPC resources related to the previous instance of the qmgr.

The bottom line here is that there should be no need to either run amqiclen, or to manually delete IPC resources and their associated files.

It's also not true that dltmqm doesn't remove the relevant files from /var/mqm/sockets, and again any such failure would be an APARable issue.
By far the most common reason for left over files in /var/mqm/sockets is that customers have "manually deleted a queue manager" by deleting the relevant files in /var/mqm/qmgrs and /var/mqm/log (or their equivalents).

tczielke

Posted: Fri Jun 06, 2014 7:30 am Post subject:

Guardian

Joined: 08 Jul 2010
Posts: 939
Location: Illinois, USA

I have always struggled with this restriction, as stated.

Let's take the use case where the queue manager crashes and comes down. Then a few minutes later (and before an administrator can restart the queue manager) the server crashes as well and reboots, which would wipe out all the shared memory and semaphores (based on my understanding) when the server comes back up. Surely, MQ is robust enough to handle this condition? If so, what is the difference between this use case and an administrator running an ipcrm on the MQ shared memory and semaphores WHEN the queue manager is down. If you run the ipcrm when the queue manager is running, you really don't have the required technical skill set to perform your job as an administrator.

From personal experience, I have seen situations where the queue manager will not start due to lingering shared memory or semparhores. Making sure the queue manager is not running, and then doing ipcrm to clean up these lingering shared resources allows the queue manager to come back up. Also, sometimes it is not practical from an SLA stand point to open a PMR and wait for IBM resources to work with you before restarting a queue manager. It needs to get back up ASAP.

tczielke

Posted: Sat Jun 07, 2014 6:56 am Post subject:

Guardian

Joined: 08 Jul 2010
Posts: 939
Location: Illinois, USA

One other comment here. If IBM would provide a script to collect the diagnostics when this condition is hit where the queue manager can not start due to lingering shared memory segments or semaphores, we would be fine with collecting the diagnostics and opening a PMR (after the fact) before clearing the shared resources and restarting the queue manager. It just does not meet our business requirements to keep a queue manager down and work through the PMR process, before we restart the queue manager.

PeterPotkay

Posted: Sat Jun 07, 2014 4:13 pm Post subject:

Poobah

Joined: 15 May 2001
Posts: 7717

tczielke wrote:

I agree and let me rephrase that slightly.

If IBM would provide a script to collect the diagnostics when serious conditions like this occur....

IBM L2 and L3 support know the type of info they need for after-the-fact diagnosis. I'm sure L2 and L3 constantly get asked to explain what went wrong but are provided no evidence because in the heat of problem resolution where getting back in business is top priority people don't stop and gather that evidence. Even if they had the wherewithal to consider gathering evidence before trying their best to get back up, they may not now all the obvious (to L2/L3) stuff to get. And even if they knew all the stuff to grab, it would take a bit of time if it needed to be done manually.

I think it would be in everyone's best interest of IBM L3 shipped a diagnostic script that would be able to run even if your QM is completely dead to gather quickly all the evidence they would typically want. Its needs to be a simple short command to run (on par with just running dspmqver and hitting enter), needs to be safe to run anytime, and needs to provide status as its doing its things. Seconds take FOREVER when there is a serious problem in your face - we can wait seconds for the script to do its job, but need a status bar or % update so we know its moving along and not hung itself.

Or does something like this already exist and I'm not aware of it or forgotten about it?
_________________
Peter Potkay
Keep Calm and MQ On

Andyh

Posted: Mon Jun 09, 2014 12:46 am Post subject:

Master

Joined: 29 Jul 2010
Posts: 237

I certainly did not mean to suggest/imply that the QMgr should be left down while any such PMR was being investigated, rather that ipcrm should not be used in the normal course of events, and that if and when a need to use ipcrm is found then documentation is collected and the problem properly investigated.
When the queue manager fails to restart because IPC resources related to the previous instance of the queue manager are still in use the strmqm command will typically produce messages showing what running processes are attached to QMgr related IPC resources that are preventing the queue manager restart from completing.
If MQ internal processes (e.g amqzlaa0, amqzmu*, ...) then collecting a few seconds on MQ trace (strmqtrc -t all -t detail -m <QMGR-NAME>) and then sending SIGUSR2 to the relevant processes should give MQ L2/L3 a good starting point to investigate the failure.
If these are non-MQ internal processes then MQ would like these processes to acknowledge the queue manager has ended, either by making an MQI call (which will receive MQRC_CONNECTION_BROKEN) or by ending (for example if they can be safely killed). If there are processes which cannot meet these criteria then it may be worth investigating if those processes can connect to the queue manager using isolated bindings, which avoid the use of shared memory and suffer a small performance penalty as a consequence.
If the process or processes which are inhibiting MQ restart cannot be identified, then the following doc would be a good starting point, although I suspect MQ L2 might have a more definitive list:
1. ipcs -a
2. ps -efl
3. amqiclen -c -v
4. strmqtrc -t all -t detail -e ; strmqm <QMGR-NAME> ; endmqtrc -a

Once the doc has been collected then I appreciate it may be necessary to take 'unsupported' action to get the show back on the road, but hopefully if the reasons this 'unsupported' action is required can be identified, rather than simply bypassed, then we can avoid the need for the continued (and dangerous ) use of ipcrm to manually destroy mqm owned ipc resources.

tczielke

Posted: Mon Jun 09, 2014 4:56 am Post subject:

Guardian

Joined: 08 Jul 2010
Posts: 939
Location: Illinois, USA

Hi Andyh,

Thanks for the information. I have noted it for future use if/when we encounter the issue where the queue manager will not start due to lingering IPC resources. In the past, what I have experienced in the Unix space is that the queue manager is complaining about specific pid ids that are no longer on the system. So the only course of action is to delete the IPC resources, if you want to restore service quickly and get the queue manager restarted.

In my opinion, there are two courses of action that the administrator could take when this situation is encountered. For both courses of action, it is assumed the queue manager is not currently running. #1 would be to have the server rebooted. This would remove the IPC resources. #2 would be to use ipcrm to remove the IPC resources. To me, it seems obvious that the server reboot in #1 should be considered a safe operational action by IBM. The ipcrm in #2 is a more tactical and less disruptive option. Since both #1 and #2 are exposing the queue manager to the same condition (the loss of IPC resources between successive queue manager runs) and #1 is safe, to me it logically follows that #2 is safe, as well.

Sorry, I am not trying to be difficult here. I just find it disconcerting about the language that the use of ipcrm is considered dangerous by IBM. I am trying to understand if there are use cases (as given above) where IBM would not consider the use of ipcrm dangerous. Eventhough it is important to get the queue manager up as quickly as possible, it is also not appropriate for an administrator to perform an action that could potentially damage the queue manager. It seems to me that there needs to be an IBM approved approach or command for the administrator to use to handle this exceptional condition where the queue manager will not start due to lingering IPC resources.

Vitor

Posted: Mon Jun 09, 2014 5:28 am Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

tczielke wrote:

Sorry, I am not trying to be difficult here. I just find it disconcerting about the language that the use of ipcrm is considered dangerous by IBM.

My 2 cents, slightly separate from this discussion:

The ipcrm command and anything that directly works on the Unix OS bowels is inherently dangerous because of the nature of what it's doing and the possibility of misuse, deliberate or accidental. A trained technician can work on the high pressure fuel system of my car confident in both his training and the fact he does this most days a week. It doesn't stop it being dangerous and won't stop him being badly injured (and my car wrecked) if he gets a bit clumsy. A less trained technician is more likely to come unstuck even if given clear instructions to follow. With the same instructions I can easily kill myself (and there's a little yellow sticker on the relevent bits warning me not to try).

So if you have a skilled, well rested sys admin who never fat fingers anything ipcrm et al are perfectly safe. Because I've never met such a person even when shaving in the morning I consider such actions dangerous.

Bottom line - you're logged on as root. What could possibly go wrong??

_________________
Honesty is the best policy.
Insanity is the best defence.

Andyh

Posted: Mon Jun 09, 2014 5:36 am Post subject:

Master

Joined: 29 Jul 2010
Posts: 237

1. A reboot would have the effect of removing all ipc resources, but it would also have the effect of stopping all active processes and hence it is clear that a reboot being safe is quite a different proposition to deleting IPC resources from under the feet of active processes currently accessing those resources. In a perfect world, deleting the IPC resources from under the feet of active processes would 'simply' result in those processes potentially receiving unexpected responses, for example MQRC_UNEXPECTED_ERROR, rather than MQRC_CONNECTION_BROKEN, however I'm unaware of any standard behaviour for processes still accessing deleted IPC resources (across all OS's) and it's certainly not an area which is explicitly tested within the lab. I don't believe that there's any possible exposure to MQ message integrity, it's more a case of how any running program might react to the unexpected removal of the resources, for example whether that process would be able to reconnect to the new instance of the queue manager.

2. MQ does have a strategy to cleaning up IPC resources related to a previous instance of the same queue manager, it is that the next strmqm command will check if any active processes are still accessing those resources, if so the restart will be aborted and where possible the processes still accessing the resources will be identified. If no active processes are still accessing the IPC resources the strmqm command will clean up the ipc resources related to the old instance of the queue manager.

In the distant past MQ wasn't very good at reliably identifying whether IPC resources were still referenced and a number of customers got into the habit of deleting IPC resources when restarting queue managers. We're currently unaware of any situation where MQ will not clean up unreferenced queue manager related IPC resources as a result of a strmqm command and would therefore prefer that customers stop using ipcrm as a matter of course.
Reliably identifying which processes are accessing the IPC resources is more of a challenge. The OS will reliably tell us whether any processes are still accessing a shared memory set, but not which processes are accessing which sets. MQ tracks this information internally, but the combination of a sufficiently abrupt termination (e.g kill -9), and Pid re-use present a challenge to a user mode program (MQ) reliably tracking this state. We do our best in these circumstances, and I certainly wouldn't expect to see us reporting non-existent Pid's, however the list of Pid's we do report is on a best can do basis.

mvic

Posted: Tue Jun 10, 2014 8:47 am Post subject:

Jedi

Joined: 09 Mar 2004
Posts: 2080

tczielke wrote:

It seems to me that there needs to be an IBM approved approach or command for the administrator to use to handle this exceptional condition where the queue manager will not start due to lingering IPC resources.

I think you already have this.

From the technote: "The advice from IBM is, leave all of MQ's shared memory segments in place. Follow the instructions given in the error messages from strmqm. End the MQ applications [2] that are still attached to the shared memory, and then strmqm will run successfully."

And note 2 is:

[2] Ending MQ applications is generally the cleanest way of causing them to disconnect from MQ's shared memory, but perhaps you do not want to end the applications. If this is the case then another method exists, though this method is only possible if the application is coded sympathetically. In this alternative method, the application must be made to go into a new MQI call. When it goes into the new MQI call it will inspect the data in MQ's shared memory and will find that the queue manager is not running. Upon finding this, the application will report a connection-broken style of error, and will disconnect from the shared memory.

tczielke

Posted: Tue Jun 10, 2014 10:45 am Post subject:

Guardian

Joined: 08 Jul 2010
Posts: 939
Location: Illinois, USA

What the tech note does not address is how to handle the situation when strmqm will not start because it is identifying pids that no longer exist. In this case, there is no option to force an application to make another MQI call or stop the application to release the shared memory segments. To use an option like ipcrm is potentially dangerous, per the tech note. I guess you could do some analysis with ipcs to see if any processes are still attached to shared memory segments before running ipcrm, but that requires some systems programming skills that is probably beyond your typical MQ administrator's skill set. Your only "safe" option may be to reboot the server, which is a disruptive option.

It feels like there should be a strmqm option to tell MQ to remove the shared memory segments. The applications that may be referencing the shared memory segments would be accessing it through the MQ stub code, so it seems like all the pieces that are at play here is IBM code, and there could be some coordination here between the queue manager and application processes that this could make this feasible, at least to me.

Display posts from previous:

Goto page 1, 2 Next

Page 1 of 2

MQSeries.net Forum Index » General Discussion » QMGR directory path being tuncated when recreating qmgr

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP