MQSeries.net :: View topic - Mass queue reallocation z/OS

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » Mainframe, CICS, TXSeries » Mass queue reallocation z/OS

Mass queue reallocation z/OS

« View previous topic :: View next topic »

Author

Message

GheorgheDragos

Posted: Wed Jun 03, 2020 2:00 am Post subject: Mass queue reallocation z/OS

Acolyte

Joined: 28 Jun 2018
Posts: 51

Dear MQ community,

As always, I come to you with a technical uncertainty, and I do apologize in advance if this has been posted before. As seen from my previous topics, I am in the middle of an z/OS MQ performance improvement project. 1st step done ( well partially but green light has been given, just waiting for the date ) about 220 queues will disappear. 2nd step, is rearranging the queues according to the nature of the messages ( fast vs slow - i've generally decided on a 500 000 microseconds or 0.5 seconds mark ) and queues have been identified, sorted, all the intelligence where to move them is more or less clear etc. The activity will be done in batch via CSQUTIL
DEFINE temp queue LIKE original queue;
MOVE QLOCAL from original to temp queue
ALTER original GET/PUT(DISABLED) - to prevent any new messages coming in
DELETE original
re(DEFINE) original LIKE temp
MOVE QLOCAL from temp to original ( to move messages if any )
DELETE temp queue.

So far so good. The key problem is, what if these queues ( or some of them ) are in use by some channel. Even if I put GET/PUT(DISABLED) some thread might still be hanging, even if no messages are incoming. Then I do not know if MOVE QLOCAL will work, or the DELETE(original). As far as I know, CSQUTIL does not have a STOP ON ERROR clause on previous command, and i really do not want to code each command paragraph with it's own JCL step ( which will be pretty much pointless ). So, my idea was to display and compile in advance a list of channels opened against a given list of queues, stop them ( via CSQUTIL as well ) , wait one minute for any thread to close, then proceed with the job to submit the steps above.
Dears, would there be any other way to force a queue local to immediately drop all open threads ( without losing messages of course ) against a given queue, or, is stopping the channel(s) in advance the best option ?
This action will be done against 4 production queue managers and in total about 410 queues will be reallocated.
Thank you for your time.

Dragos

Vitor

Posted: Wed Jun 03, 2020 4:38 am Post subject: Re: Mass queue reallocation

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

GheorgheDragos wrote:

The key problem is, what if these queues ( or some of them ) are in use by some channel. Even if I put GET/PUT(DISABLED) some thread might still be hanging, even if no messages are incoming. Then I do not know if MOVE QLOCAL will work, or the DELETE(original). As far as I know, CSQUTIL does not have a STOP ON ERROR clause on previous command, and i really do not want to code each command paragraph with it's own JCL step ( which will be pretty much pointless ).

Never say a JCL step is pointless in front of me.

GheorgheDragos wrote:

So, my idea was to display and compile in advance a list of channels opened against a given list of queues, stop them ( via CSQUTIL as well ) , wait one minute for any thread to close, then proceed with the job to submit the steps above.
Dears, would there be any other way to force a queue local to immediately drop all open threads ( without losing messages of course ) against a given queue, or, is stopping the channel(s) in advance the best option ?

This all seems like a lot of fuss and bother.

I do agree that 4*410 cut and pasted JCL steps are at best redundant. This is an ideal use case for a JCL PROC.

If it was me (and of course it isn't), I'd create a PROC that:
- checks to see if the queue is in use (CSQUTIL & some REXX)
- if true, find and stop as above
- if false or when the previous step is successful (JCL conditional logic) execute the move)
- barf appropriately if something goes wrong
- tidy up

I'd then execute that PROC using the queue names as parameters. I would also split the list up to get some parallelism. This is nothing to do with MQ and everything to do with JCL.

I think it makes sense to attempt to have everything stopped and the queues not in use for this exercise just because it's cleaner, easier and underlines to the masses that something big is happening & they should check their little corner when you're done.
_________________
Honesty is the best policy.
Insanity is the best defence.

bruce2359

Posted: Wed Jun 03, 2020 4:56 am Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9489
Location: US: west coast, almost. Otherwise, enroute.

Before you begin this performance improvement project...

What performance issues are you facing? Are you missing SLA's? For all apps? Just for some apps? What tooling are you using to quantify the performance issue?

What performance improvement do you expect to see at the end of the project? How will you measure the improvement?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

gbaddeley

Posted: Wed Jun 03, 2020 3:44 pm Post subject:

Jedi Knight

Joined: 25 Mar 2003
Posts: 2538
Location: Melbourne, Australia

Are you renaming the queues, or just moving them to a new storage class? Its going to be much easier if you do it when the queues are empty. All MQ queues should be empty most of the time

_________________
Glenn

hughson

Posted: Wed Jun 03, 2020 5:16 pm Post subject: Re: Mass queue reallocation

Padawan

Joined: 09 May 2013
Posts: 1977
Location: Bay of Plenty, New Zealand

GheorgheDragos wrote:

As far as I know, CSQUTIL does not have a STOP ON ERROR clause on previous command

Please see Using the COMMAND function of CSQUTIL on z/OS

IBM Knowledge Center wrote:

FAILURE
Specifies what action to take if an IBM MQ command that is issued fails to execute successfully. Values are:

IGNORE
Ignore the failure; continue reading and issuing commands, and treat the COMMAND function as being successful. This is the default.
CONTINUE
Read and issue any remaining commands in the input data set, but treat the COMMAND function as being unsuccessful.
STOP
Do not read or issue any more commands, and treat the COMMAND function as being unsuccessful.

Is that what you need?

Cheers,
Morag
_________________
Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software

GheorgheDragos

Posted: Wed Jun 03, 2020 10:43 pm Post subject:

Acolyte

Joined: 28 Jun 2018
Posts: 51

Good morning everybody ( Or hello where applicable ),

Thank you for the input. I will try to encompass all replies, so 1st thank you for the JCL suggestion, I was hoping to avoid a more "complicated" solution ( complicated for me ) and anything else including rexx, because, I do not know how to use it and I would need the help of automation colleagues. I wouldn't worry to split the JCL's in to multiple procs, as mentioned to achieve parallelism, what we have agreed on with the client is that the queues ( most of them ) are allowed to be unavailable between 1-10 minutes on a Sunday morning between 02:00-03:00. So, with my leaner CSQUTIL idea, even if the queues have messages, in PRD, it won't take longer than a few seconds/minutes to move 1/200 000 messages so parallelism is not needed. Good idea though. So if I understand correctly, the 1st step of the given JCL should be to execute a rexx to check when the queues are in use and stop given channel, then perform move. I am bothering the rexx people as I type this because this is a good idea.

Before you begin this performance improvement project...

Quote:

What performance issues are you facing? Are you missing SLA's? For all apps? Just for some apps? What tooling are you using to quantify the performance issue?

- Buffer pool is paging, seen from MP1B output. *slow* queues are mixed with queues which I determined to be fast ( and confirmed by the client ) below the 0.5 seconds mark. So, the scope of this is to have one adequately sized BP for fast messages, and smaller BP for queues that are being used as buffer storage ( instead of DB2 ) - for example cumulate lots of messages throughout the day and later on batches treat it, or the trigger is enabled and CICS is given a go. Not optimal, client knows, doesn't want to move do DB2. As far as I know, now SLA's are being missed, for any apps, otherwise for sure we'd have known. Why to have a money transfer connected message spend 1.5 seconds in a queue when it can spend 0.5 ? If it bottlenecks someplace else ? Fine, at least MQ is working at peak performance. We do not have zHyperwrite at the moment

Quote:

What performance improvement do you expect to see at the end of the project? How will you measure the improvement?

The optimal improvement I would like to see is that BP1 will never page, and that messages will be treated without being written to disk ( disregarding persistent messages ) and to achieve a normalised MQ installation where each queue goes in to it's own zone according to its nature. I have also identified problems with the logger where it goes in a WAIT FOR BUFFER situation, multiple times ( even hundreds ) per 30 minute SMF interval per queue manager. Now at this moment the WRTHRSH is set at 15, and I would like to maximise it ( 256 if I remember correctly kb ) and see if the problem will reappear, and even if, or not, I will redefine all logs with striped version. The problem I think is that during some time intervals there are tons of small messages being put and logged.
The tool that I will use to compare values is MP1B.

Quote:

Queues that should always be empty ?

Well I agree. Doesn't change anything though

Queues will be just moved and not renamed ( Renamed in the sense that a TEMP queue will be used with the DEFINE LIKE command ).

@Moragh - how come I didn't see that ? Please forgive me for this, Ideally it should be enough, but for this to function * ideally * is for me to use the 410/420 JCL procs, because if the 1st paragraphs of CSQUTIL commands for a queue fails, I want the rest to continue, and just the rest of command associated with one single queue to stop.
Nonetheless, the rexx idea seems the most sophisticated one so far, I am just not a very sophisticated MQ administrator ( no rexx skills no good JCL skills and just reusing and adapting the tons of tools already at my disposal ).
BTW - The tool that I am using is called Omegamon, it is an IBM product and TEP ( Tivoli Enterprise Portal ) and I use this just to get some snapshot, for long time statistics, SMF + MP1B. There is an instance of MXG flying around the installation but I simply cannot be bothered to learn how to use it when MP1B gives me the exact result.

Nonetheless, I am not sure 100% how to proceed, effort wise vs results

JCL PROC + REXX or good old "brute force".

Have a nice day

Dragos

** edit

Dears, I just realised, it's quite simple. Before I run the CSQUTIL batch, I simply run a dis qstastus etc on all the queues for a given QM filter for channel, mass stop, wait 1 min, run the job, mass start and problem fixed.

bruce2359

Posted: Thu Jun 04, 2020 4:02 am Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9489
Location: US: west coast, almost. Otherwise, enroute.

GheorgheDragos wrote:

I am just not a very sophisticated MQ administrator ( no rexx skills no good JCL skills and just reusing and adapting the tons of tools already at my disposal ).

My humble opinion: Your lack of prerequisite skills and understanding of the z/OS-MQ environment should be a signal to management to end this project before it begins - I believe it is doomed to fail.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

GheorgheDragos

Posted: Thu Jun 04, 2020 6:08 am Post subject:

Acolyte

Joined: 28 Jun 2018
Posts: 51

Hello,

Wow, sweet Jesus, I haven't been praised for which I think I should have been praised, but mother of God, this is a slap on the face.
Nonetheless, I have support from high management and I am confident it will end in success. Maybe if you tell me where am I wrong in starting this project I can tune my approach ?

Have a nice day

Dragos

bruce2359

Posted: Thu Jun 04, 2020 7:20 am Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9489
Location: US: west coast, almost. Otherwise, enroute.

Not intended as a slap; rather, a reality check. You self-identify as someone with limited/no z/OS-MQ experience, but hoping to make mass changes to a working environment.

You've not specified whether you intend to do this in a TEST environment first as proof-of-concept. You've not specified that you have a fall-back plan in place should this project fail.

At the very least, I'd expect that your action plan would first define the new page sets, storage classes, and so on; then, migrate one or two queues; then measure results.

I do praise your interest in doing something to improve throughput, but your narrative did not identify current throughput bottlenecks. A message in a queue waiting to be processed for 1.5 seconds (as you describe) likely has little/nothing to do with MQ or z/OS; rather, more to do with whatever the transaction is doing or not doing.

You cite paging rate. Paging is not a problem in z/OS; rather, it merely describes a situation where a virtual address does not immediately translate to a real address. Paging does NOT mean MVS internals must go to some paging disk device to resolve the issue. MVS (the base o/s in z/OS) can page in the tens of thousands of pages per second with little/no affect on throughput.

You mentioned MQ page sets/buffer pools vs. DB2. Not sure what your point was, since MQ does not store messages in any data base, DB2 or other.

Sorry to come off as hyper-critical, but if I were your manager, I'd expect far more specific deliverables than a general "improved throughput." I'd ask "which transactions? What specific throughput improvement? What detrimental effects on other apps?" and so on.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

GheorgheDragos

Posted: Thu Jun 04, 2020 7:54 am Post subject:

Acolyte

Joined: 28 Jun 2018
Posts: 51

Hello Bruce, and thank you for the mature response. Please, allow me to elaborate( I do understand that from time to time my chain of thought is not as clear and concise as it should be ).

The effort consists of me analyzing statistics with MP1B, and interpreting the results based on the document provided by IBM and with my 5 years of experience. The problem consists of the following, when we have 3 or 4 buffer pools with several tens of pagesets associated with them, each pageset hosting tens of queues, used for different purposes.

One queue ( or set of queues ) used for payments so everything should be instant, one set of queue is used to accumulate messages throughout the day that will be treated overnight by whatever process ( batch, CICS, JMS etc).

The main purpose of this, is to identify the nature of the queues ( by looking at statistics gathered over a period of a week ) by the QTIME parameter ( AVG and LAST ). To see what kind of workload those queues do.

Please remember that I have inherited the system from a group of very old very wise men, who knew more or less by heart where each queue should go( which BP/PS ). We defined everything "free for all" , a mistake which I accept and I am trying to remedy.

Then, I have created a rule of thumb, the 500 000 microseconds ( or 0.5 seconds ), so if a message on average spends less than 0.5 seconds on a queue, it is considered fast, if not, long lived, and in some instances the client pointed out, that, if this message spends 1.5 seconds, it is still critical and should be considered fast, therefore, if I can decrease the waiting time for this message, as in the example, from 1.5 seconds, to 1 second, isn't it worth it ?
Even if the "bottleneck" is on the web, or CICS or whatever. Based on this, I have submitted the proposal to the client, to confirm ( or correct ) whether their queues are to be used for fast, short lived messages, or, they are "buffer" queues ( that is what I meant when I said, instead of using DB2, they are using MQ ) , or problem queues, or application log queues etc, where response time is not critical.

I have about 90% of the answers needed, from all tens of dev teams from the client side. Some said yes these queues are to be treated as fast queues, some said yes they can be moved. And when I say moved, I mean this, as described in my original post : I take all the queues that trigger processes, all queues used in online banking, where to minimize response time is optimal and if improvement can be done, it should be done ( otherwise why not ? ), and move them to pagesets associated with buffer pool 1.

That means, that these queues will be isolated from queues which are being filled up by batch jobs etc, and when that happens, the buffer pool gets 85% full. Then it starts to write the oldest messages to pageset ( to disk - and please trust me I know the DS whatever we use has 512 GB of buffer as well, BUT, these are long lived, non critical messages, mixed with messages which will never be written to pageset, but persistent messages written to the log - that is a different story ). Am I right ? So, why not to separate the workload ?

When I mentioned I am not a sophisticated MQ administrator, I meant that I am not ( yet ) at the level of the people who gave us the environment, for whom I had and have great respect. That doesn't mean I don't have a firm grasp of at least MVS fundamentals( which I should, after 7 years of operations - from which 3 hard ) and 5 years of CICS MQ administration(most of the time unattended, 90% of what I know I know from self study and dealing with issues ).

You also mention why I don't define new pagesets. Well, I already thought of this ( because our pagesets are not defined in a SMS DATACLASS which will allow them to expand when needed and we absolutely must do this ), BUT, I do not want to mix two activities in one. One by one. Plus, because our pagesets are not automatically expanded, I also made a detailed list of space of each pageset in each BP and its utilisation over time, so I won't over burden it, and compared it with the historical MAX depth of queues. This is not hard, but its time and energy consuming.

You also mention if I have done this in lower environments to have proof of concept(?) , forgive me, I do not remember now. Our lower environments are at 95% less capacity than PRD. It simply makes no sense for me to even attempt to gather all this gigantic level of information ( hundreds of mails sent, tens of hours of excel-ing, deciding what should go where etc ), when, I know for a fact that, at least in theory, if I separate and isolate queues one from another, the buffer pool should never get 85% full ( and that I will fine tune by analyzing statistics over a period of a few weeks once this is done), and messages should only be written to log ( which I will replace with a striped vsam after the reallocation is done - btw we don't have an sms construct to allow vsam striping, I had to fight with storage team to get this done ).

To the detriment of what other applications will this effort be guilty of ? no other applications. Just because, after I will resize buffer pools 2,3 and 4 and allow them to page, it doesn't mean it will be slow. But not as fast as if the message stays only in buffer. You also mention bottlenecks. Well Bruce, I am able to fine tune MQ. I don't want my queue managers to look like a classroom the day before the great vacation. I want order and cleanness. If the consuming application is bottlenecked, then I will have to work with application people to ensure parallelism. Sometimes this is not possible.

I have great respect for the members of this community, and I appreciate all feedback, as long as it is followed by a reason. I take great pride in my work and my hard earned skills. Not that it matters, since all with a factual memory could do it, but I have recently achieved the v9.1 Administration certification. Not that I believe in certifications, but rather as a feat of strength, to evaluate my skills, and to have a learning path.

Have a nice day.

Dragos

***

Edit - I forgot to add. Without pounding my chest, but this effort is easy, technically speaking. Even an operator should know how to redefine a queue. Remember, if queue A on PS1/BP1 is know to host on average 50 000 messages, it will go to an equally sized, and statistically space utilisation analyzed pageset on a different buffer pool.

bruce2359

Posted: Thu Jun 04, 2020 10:01 am Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9489
Location: US: west coast, almost. Otherwise, enroute.

Again, I intended no offense. It seems I mistakenly presumed you/your project
might benefit from a critique. I wish you well on your project.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

bruce2359

Posted: Fri Jun 05, 2020 5:04 am Post subject:

Poobah

Joined: 05 Jan 2008
Posts: 9489
Location: US: west coast, almost. Otherwise, enroute.

Moved to mainframe forum.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » Mainframe, CICS, TXSeries » Mass queue reallocation z/OS

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP