ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » Mainframe, CICS, TXSeries » MQ for z/OS QMGR stalls during startup after IPL

Post new topic  Reply to topic Goto page 1, 2, 3  Next
 MQ for z/OS QMGR stalls during startup after IPL « View previous topic :: View next topic » 
Author Message
TXM751
PostPosted: Wed Mar 04, 2020 11:37 am    Post subject: MQ for z/OS QMGR stalls during startup after IPL Reply with quote

Novice

Joined: 04 Mar 2020
Posts: 14

We have recently implemented MQSeries Queue sharing (obviously on z/OS)

2 weeks following the implementation of the first Queue sharing group (which had 2 members - one on each of two LPAR's), one of the members failed to get it's initialization complete message following an IPL.

A few weeks later, we created another Queue sharing group (on the same 2 LPAR's). This Queue sharing group is in the same DB2 datasharing group as the previous QMGR's mentioned above.

A few weeks later, the original stalling QMGR and one of the newly added QMGR's both stalled. Both QMGR's are on the same LPAR but are members of different Queue Sharing Groups.

We IPL these development LPAR's every weekend. The problem occurs every 2-4 weeks after the IPL and ONLY affects these 2 QMGR's. We have rolled out Queue sharing to other Queue managers on other LPAR's - even into production. No other QMGR's are experiencing this problem

I raised a PMR with IBM, but the response was that there is no error message so there is no problem (I disagree).

My initial thoughts were perhaps DB2 hadn't initialized yet .. but that's not the case. I also thought that maybe RRS was the issue, but it was up.

I cannot communicate with these QMGR's when they are in this state .. the only thing that I can do is Cancel the MSTR address space and restart them. The QMGR's always start fine on the second attempt.

I don't see any WTOR's or error messages in the SYSLOG or in the MSTR address space logs ...

I'm baffled.

Anyone ever see this before?

The QMGR's are MQ V9.0 VUE LTSR and are running RSU1903 maintenance level
Back to top
View user's profile Send private message
hughson
PostPosted: Wed Mar 04, 2020 12:29 pm    Post subject: Re: MQ for z/OS QMGR stalls during startup after IPL Reply with quote

Padawan

Joined: 09 May 2013
Posts: 1914
Location: Bay of Plenty, New Zealand

TXM751 wrote:
I raised a PMR with IBM, but the response was that there is no error message so there is no problem (I disagree). .

I cannot communicate with these QMGR's when they are in this state .. the only thing that I can do is Cancel the MSTR address space and restart them. The QMGR's always start fine on the second attempt.


I am appalled to hear that IBM Service consider there to be no problem with a queue manager you cannot communicate with. Do not leave it there. Please ensure that they understand there is a problem and deal with it. I could imagine needing to capture an SVC dump of the MSTR address space to see what it is stuck waiting on.

Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software
Back to top
View user's profile Send private message Visit poster's website
GheorgheDragos
PostPosted: Thu Mar 05, 2020 2:36 am    Post subject: Reply with quote

Acolyte

Joined: 28 Jun 2018
Posts: 51

Hello,

As there are no let's say extra information, may I ask. What is the last , or what are the last lines at the end of the active output. Where is it stopping ? Have you tried passing commands via : panels, SDSF, CSQUTIL or MVS console ? Are there any messages in the CHIN ? what about messages in the adjacent queue manager on the remote lpar ? do you have any monitoring tools ( Omegamon ? ), and, if yes, if you consult the queue sharing group panel do you see both queue managers there ? I haven't seen this before, I would try to snap the qmgr in to existence via a command to continue loading, with any available interface.

Dragos Gheorghe
Back to top
View user's profile Send private message
TXM751
PostPosted: Thu Mar 05, 2020 3:46 am    Post subject: Reply with quote

Novice

Joined: 04 Mar 2020
Posts: 14

Hi Morag ... The text from the IBM employee in the PMR was:

"Normal problem diagnosis means that a customer HAS an error message, and does not know why they are getting it.

So, it is easy in that case to set a SLIP trap, and when that error pops, a Dump is taken for later diagnosis.

But if you are NOT getting this error, then there is nothing to slip on, and no Dump is produced.

How can IBM diagnose nothing?"


I can understand their frustration ... I also wish to point out that I always receive EXCELLENT support from IBM, and suspect that I will get a satisfactory resolution in this case ...

I only raised this 'Question' on this forum to see if anyone else has encountered this problem before

Hopefully between my contacts within IBM, the IBM Service Link and this forum, I will find an answer to this vexing problem.

I will be taking a series of dumps the next time that this happens (probably this weekend) and will send the dumps and logs to IBM for analysis.

It has been suggested that this might be a timing issue and that DB2 might not have been up when MQ was coming up .. I will be reviewing the logs to check out this theory.

If you have any questions or suggestions, I would love to hear from you.

Thank you .. Tom Malone
Back to top
View user's profile Send private message
TXM751
PostPosted: Thu Mar 05, 2020 4:36 am    Post subject: Reply with quote

Novice

Joined: 04 Mar 2020
Posts: 14

To Dragos,

You have some excellent questions and suggestions. Thank you.

I will attempt to answer those questions that I can, and will use some of your other questions for investigation the next time that the problem occurs.

The QMGR appears to start perfectly fine. All of the expected messages appear .. there are no error messages. We have commands that get issued to the QMGR during the start up phase (Display USAGE PSID(*) for example) and they work. I see the QMGR connecting to DB2, I see messages about the CF structures, I see messages about the Recovery logs etc.

I am only missing the message that says that the QMGR is initialized: CSQY022I +MQA7 QUEUE MANAGER INITIALIZATION COMPLETE

As for the last messages that we see .. Here they are (note the 50 minute gap - that's when we were stalled and then attempted to shut the MSTR address space down)

22.16.56 S0167420 CSQE005I +MQA7 CSQECONN Structure SYSTEM connected as 260
260 CSQEQ1A@MQA702, version=D6F7845B4811E704 00010023
22.26.52 S0167420 CSQY220I +MQA7 CSQSCTL Queue manager storage usage: 177
177 local storage: used 904MB, free 467MB: above bar: used 375MB, free 1GB
23.15.07 S0167420 CSQM131I +MQA7 CSQMCCLU CHANNEL INITIATOR NOT ACTIVE, CLUSTER AND 620
620 CHANNEL COMMANDS INHIBITED

We ended up having to Cancel (S222) the MQA7MSTR address space



We use automation to trigger off the initialization message and start the CHIN address space .. because this message does not get issued, the CHIN does not start (and obviously there are no messages in the CHIN that I can tell you about)

When I attempted to issue /+cpf commands via SDSF, they had no effect (I will confirm again the next time it happens). I also recall that I could not use the commands from the panels (but will confirm next time).

I did not think to check the status from the other member of the queue sharing group on the other LPAR .. I will keep that in mind for next time (good suggestion)

We don't have a tool (Omegamon) to monitor MQ (but thank you for the suggestion)



You have given me some further suggestions for Problem determination .. I will use them next time I encounter this problem .. Thank you

Tom Malone
Back to top
View user's profile Send private message
bruce2359
PostPosted: Thu Mar 05, 2020 6:05 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9394
Location: US: west coast, almost. Otherwise, enroute.

TXM751 wrote:
As for the last messages that we see ..
23.15.07 S0167420 CSQM131I +MQA7 CSQMCCLU CHANNEL INITIATOR NOT ACTIVE, CLUSTER AND 620
620 CHANNEL COMMANDS INHIBITED

Did you look at the CHIN syslog to see why it failed to come to life?

There is one more dump - a standalone dump - that IBM may request. If IBM requests a SAD dump, please follow instructions carefully.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
TXM751
PostPosted: Thu Mar 05, 2020 6:24 am    Post subject: Reply with quote

Novice

Joined: 04 Mar 2020
Posts: 14

Reply to Poobah ... The start command for the CHIN never gets issued so there is no log to look at

The reason it doesn't start is that it is controlled by automation. The start is triggered by the MSTR address space Initialization complete message

My automation folks told me that I could force the CHIN to start .. Alas when I attempted to do this, it still would not start. I suppose I could have attempted to Start the CHIN manually, but that usually just messes up automation and leads to someone in the automation group coming to my desk and having a conversation with me ..

I have been given the dump instructions .. I have done this hundreds of times, so I'm pretty good at following IBM's instructions .. But Thank you

I already sent IBM a set of dumps which I took before I even opened the Incident with IBM .. they reviewed them, but didn't find a smoking gun.

I'm not sure that they will find anything ... Like I said, there are no abends, no error messages .. Nothing to indicate a problem .. The MSTR address space starts ..., Looks perfectly fine .. Almost makes it to the finish line (Initialization complete) .. and then just stops ... No messages in the MSTR log .. No messages in the SYSLOG .. NOTHING .. Very weird

That's partly the reason I am reaching out to the community here .. I have never heard of anything like this before ... (I have been supporting MQ for over 20 years on multiple platforms for multiple companies and organizations)

I appreciate all of the questions and comments .. You never know when someone might ask a question or make a suggestion that leads to a solution

Thanks .. Tom Malone
Back to top
View user's profile Send private message
bruce2359
PostPosted: Thu Mar 05, 2020 6:40 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9394
Location: US: west coast, almost. Otherwise, enroute.

If your Internal automation doesn’t start the CHIN automatically, then, as a test:

- start the MSTR manually, then
- start the CHIN address space manually. Post here the command you issued to start the CHIN, the response from your command, and the CHIN syslog.

The command should look something like this: /MQ00 START CHINIT

Usually, the START CHINIT command is imbedded in the MSTR startup proc. Not sure why your organization uses other automation to do so.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.


Last edited by bruce2359 on Thu Mar 05, 2020 8:00 am; edited 1 time in total
Back to top
View user's profile Send private message
bruce2359
PostPosted: Thu Mar 05, 2020 7:43 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9394
Location: US: west coast, almost. Otherwise, enroute.

Moved to mainframe forum.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
TXM751
PostPosted: Thu Mar 05, 2020 7:53 am    Post subject: Reply with quote

Novice

Joined: 04 Mar 2020
Posts: 14

I thought I mentioned .. I attempted to start the CHIN via automation ... It does not start

Sure I could have attempted to start it outside automation .. but the QMGR was ignoring my attempts to talk to it .. so I doubt very much it would have accepted my Start Chinit command.

Any attempts on my part to issue /+MQA7 ... commands were ignored by the QMGR

Tom Malone
Back to top
View user's profile Send private message
bruce2359
PostPosted: Thu Mar 05, 2020 9:39 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9394
Location: US: west coast, almost. Otherwise, enroute.

TXM751 wrote:
I thought I mentioned .. I attempted to start the CHIN via automation ... It does not start

Does SYSLOG show the command automation used to start the CHIN? Post that command here.

TXM751 wrote:
Sure I could have attempted to start it outside automation .. but the QMGR was ignoring my attempts to talk to it .. so I doubt very much it would have accepted my Start Chinit command.

That's why I suggested starting the MSTR manually first, then starting the CHIN manually next. This is how a new qmgr is implemented - manually first, then apply automation, if necessary.

TXM751 wrote:
Any attempts on my part to issue /+MQA7 ... commands were ignored by the QMGR

Do you mean that your command does not appear on SYSLOG? Or something else? Is RACF (or equivalent) preventing you from issuing operator commands? Can you successfully issue commands to MQA6, too?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
TXM751
PostPosted: Thu Mar 05, 2020 9:43 am    Post subject: Reply with quote

Novice

Joined: 04 Mar 2020
Posts: 14

I have no problems issuing MQ commands under normal circumstances

What I am saying is that when I enter them in this particular case, they are ignored .. NOTHING happens ..

The QMGR ignores me .. it even ignores commands to shut down .. I have to cancel the address space

Tom Malone
Back to top
View user's profile Send private message
TXM751
PostPosted: Thu Mar 05, 2020 9:47 am    Post subject: Reply with quote

Novice

Joined: 04 Mar 2020
Posts: 14

This is not a new QMGR .. it has existed for 20 years ... It only started having problems since I implemented Queue Sharing.

I was hoping that someone had encountered this problem before and had some advice

No one has stepped up to report the same issue yet ...
Back to top
View user's profile Send private message
TXM751
PostPosted: Thu Mar 05, 2020 9:51 am    Post subject: Reply with quote

Novice

Joined: 04 Mar 2020
Posts: 14

I also said that Automation does not attempt to start the CHIN .. so there won't be any message in SYSLOG showing the start command

Pretty simple

Automation sees the MSTR address space Initialization complete message and issues the START CHIN command (it's been doing this every weekend for a decade now) ..

In this case, the MSTR address space does not get the Initialization complete message .. It JUST HANGS ... hence automation doesn't issue the START Chin command

And if I attempt to start the CHIN manually, I guarantee it won't start because the QMGR is ignoring every attempt I make to communicate with it.

And why do you care about the CHIN .. The MSTR address space is the problem
Back to top
View user's profile Send private message
bruce2359
PostPosted: Thu Mar 05, 2020 10:37 am    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9394
Location: US: west coast, almost. Otherwise, enroute.

TXM751 wrote:
I have no problems issuing MQ commands under normal circumstances

What I am saying is that when I enter them in this particular case, they are ignored .. NOTHING happens ..

NOTHING happens is not helpful in doing problem source identification (PSID).

Unless suppressed, commands issued to MQ (subsystem interface) will show up on SYSLOG. Does SYSLOG (operators console) show any MQ commands at all ever, for any qmgr?

How do you issue MQ commands? Please be precise. Do you issue commands from the op console in computer room? From CSQOREXX panels? From SDSF? From ISPF option 6 Command Shell panel?

QSG's are an advanced z/OS/MQ skill set. Is this your organizations first attempt to implement QSG's? Is this your first attempt to implement QSG's? Is the DB2 data sharing software new to you/your organization?

What documentation or cookbook or check-list are you following?

Please be patient. I/we are trying to help. A hung qmgr is waiting for something. MQ is usually pretty chatty about what it wants/needs.

Precisely, what kind of dumps have you taken? I asked if you had taken a SAD dump. You didn't answer specifically if you had take one or not. Again, please be precise in your answers to questions. A SAD dump will kill off a z/OS instance - all other workload would have been killed in-flight. Did you pass the SAD dump through IPCS?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Goto page 1, 2, 3  Next Page 1 of 3

MQSeries.net Forum Index » Mainframe, CICS, TXSeries » MQ for z/OS QMGR stalls during startup after IPL
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.