ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum IndexIBM MQ Performance MonitoringIBM MQ8.0 z/OS performance

Post new topicReply to topic
IBM MQ8.0 z/OS performance View previous topic :: View next topic
Author Message
GheorgheDragos
PostPosted: Sun Jul 22, 2018 11:29 pm Post subject: IBM MQ8.0 z/OS performance Reply with quote

Apprentice

Joined: 28 Jun 2018
Posts: 31

Hello,

My name is Dragos Gheorghe, I am 30 years old and operating as a z/OS CICS adminfor the last 3.5 years, after spending around 7 as a z/OS Operators in 2 countries for 3 diferent corporations. I pride myself that my overall MF knowledge is pretty extensive.
In my current position, things are well, and I am afraid I have reached a point where I am more or less stagnating. There are many things to learn, and to improve, however, I cannot decide in any direction.
So in this case, I have found, through a colleague, a statistics gathering tool for MQ- MP1B, I can't post links thought... pretty extensive one, as our current one, based on SAS - MXG anyone ? - is a little tricky for me, plus I haven't really put that much effort in to it, because I prefer my own solutions than to build and improve on antique(which are good) designs.
I have gathered statistics for the last 2 weeks from SMF, both 115 and 116.
I would like to know, if possible, what should I use to create an excel chart to show the daily usage,PUT's and GET's ? This is to see when MQ is peaking, what can I do to improve, for example, if a buffer pool gets full and messages are being written to disk. So that I can increase the number of buffers, or, allocate over the bar.
And this issues another question. What would you recommend. To have below the bar ( 4 KB ) buffers for long lived message, and over the bar for small ones? Now we have buffers only below. But that has another impact, because I understood that if we have a large buffer, and the utilization is around 70%, the performance actually decreases.

In any case, I am considering indexing queues based on the reports from the tool, moving queues with persistent messages to large buffer pools( or unused ) , and moving non persistent queues to PS-ID's with smaller and larger used BP.. Or is indexing an option to be discussed with the application development team as well, so that they may add Msgid or CorrelliD to their messages.

Have you ever done this with your installation ? Would you have any recommendations for performance tuning?
Any input would be greatly appreciated.

Dragos Gheorghe
Back to top
View user's profile Send private message
gbaddeley
PostPosted: Mon Jul 23, 2018 4:18 pm Post subject: Reply with quote

Jedi

Joined: 25 Mar 2003
Posts: 2077
Location: Melbourne, Australia

You have some reasonable ideas for improving MQ performance on z/OS. You should really only look at changing anything if you can identify performance or efficiency bottlenecks in MQ, as MQ performs very well without much tweaking.

Indexing queues can improve performance if queue depth becomes significant (eg. > 100 - 1000 msgs) and msgs are fetched by correlid or msgid. However, high queue depths can indicate other issues.

Its a good idea to not have high volume app queues on the same pageset as system queues. If the pageset fills, the qmgr may become unresponsive.

Issues quite often only manifest under high load (eg. > 50 - 200 msgs/sec), and are usually caused by other apps (eg. CICS DBS TCP) or bad MQ messaging designs.

Have you looked at the MQ z/OS performance reports?
http://www-01.ibm.com/support/docview.wss?uid=swg27007150
_________________
Glenn
Back to top
View user's profile Send private message
GheorgheDragos
PostPosted: Mon Jul 23, 2018 10:41 pm Post subject: Reply with quote

Apprentice

Joined: 28 Jun 2018
Posts: 31

We have plenty of buffer pools with enough buffers(4KB). Buffer pool 0 gets full and starts writing messages to pageset simply because it has qmgr to qmgr channel checks ( on mainframe and distributed ) and the messages are persistent.
[i]
BP 1 Some ( pages read from disk. Buffer pool may be too small
BP 1 Many (588) pages read from disk. This is typical oflong lived messages. Buffer pool may be too small[/i]

Our DEVPlex qmgr's rarely go over 2/300 messages per second.
The problem might be that our app dev teams request always persistent queues which I think it's not because the messages are "critical", especially in TST GTU etc, but because they don't want to modify their apps.

MSG throughout ( GET/PUT )

7/11/2018 1:52:40 259/sec 7/sec
7/11/2018 2:23:00 259/sec 7/sec
7/11/2018 2:52:17 257/sec 7/sec
7/11/2018 3:21:58 257/sec 10/sec
7/11/2018 3:51:50 257/sec 7/sec
7/11/2018 4:21:45 257/sec 7/sec
7/11/2018 4:51:46 257/sec 7/sec
7/11/2018 5:21:34 256/sec 7/sec
7/11/2018 5:51:38 256/sec 7/sec
7/11/2018 6:21:37 255/sec 7/sec
7/11/2018 6:51:19 254/sec 7/sec
7/14/2018 7:34:44 251/sec 7/sec
7/14/2018 8:04:45 251/sec 6/sec
7/14/2018 8:34:46 251/sec 6/sec
7/14/2018 9:04:04 250/sec 6/sec
7/14/2018 9:34:00 250/sec 6/sec

And that is just because this particular QMGR (with another one for DR purposes ) acts as a comms node ( QR, SVRCONN's etc )

Another repeating alertin the messages extracted with the MP1B tool is :
[i]
QEST structure xxxxxxxx long average response time 423.[/i]

However, I do not think this is an issue because the average response time for a CF defined ( by MVS team ? ) is not "realistic". <- EDIT : This is a value we can pass to the MQSMF program to determine the "average" CF response time.

To be investigated :

Fixed pool contractions 11 > 0 ???
Variable pool contractions 12 > 0 ???
QIST read ahead message count 68 > 0 ???
BP 1 get old to get new page ratio > 41 . Queues not indexed ? Could be a lot of browse activity ???

Now this is an issue :
[i]
BP 3 Filled many(121) times. This is typical oflong lived messages. Buffer pool may be too small[/i]

Thank you for your reply.
If you have any ideas about the ??? fields, It could be interesting to share.

Dragos Gheorghe

*******************************************************************************

Here are the PRD details :

CPUB,<qmgr>,2018/07/24,01:25:34
BP PS count tot-time avg-time rate QL
2018/07/24,01:25:34,<qmgr>,001,020,Read, 6, 12093, 2015, 0,QLNAME

2018/07/24,01:25:34,<qmgr>,001,031,Read, 4, 890, 222, 0,QLNAME

2018/07/24,01:25:34,<qmgr>,001,031,Write, 1, 244, 244, 0,QLNAME

2018/07/24,01:25:34,<qmgr>,001,032,Read, 50, 13245, 264, 0,QLNAME

2018/07/24,01:25:34,<qmgr>,001,032,Write, 14, 4538, 324, 0,QLNAME

2018/07/24,01:25:34,<qmgr>,001,043,Write, 26, 6690, 257, 0,QLNAME

2018/07/24,01:25:34,<qmgr>,001,052,Write, 23, 6479, 281, 0,QLNAME

2018/07/24,01:25:34,<qmgr>,001,060,Read, 38, 16993, 447, 0,QLNAME

2018/07/24,01:25:34,<qmgr>,001,060,Read, 7, 1670, 238, 0,QLNAME

2018/07/24,01:25:34,<qmgr>,001,060,Read, 13, 4133, 317, 0,QLNAME

2018/07/24,01:25:34,<qmgr>,001,060,Read, 16, 4691, 293, 0,QLNAME

2018/07/24,01:25:34,<qmgr>,001,060,Write, 111, 51431, 463, 0,QLNAME

2018/07/24,01:25:34,<qmgr>,002,053,Read, 1, 1210, 1210, 0,QLNAME

2018/07/24,01:25:34,<qmgr>,003,056,Write, 1, 954, 954, 0,QLNAME

2018/07/24,01:25:34,<qmgr>,001,001,Write, 3, 756, 252, 0,QLNAME

2018/07/24,01:25:34,<qmgr>,001,001,Write, 5, 1721, 344, 0,QLNAME

2018/07/24,01:25:34,<qmgr>,001,020,Read, 4299, 1343854, 312, 2,QLNAME

2018/07/24,01:25:34,<qmgr>,001,001,Write, 8, 3304, 413, 0,QLNAME

2018/07/24,01:25:34,<qmgr>,001,001,Write, 6, 1992, 332, 0,QLNAME

2018/07/24,01:25:34,<qmgr>,001,001,Write, 2, 634, 317, 0,QLNAME

2018/07/24,01:25:34,<qmgr>,001,001,Write, 1, 358, 358, 0,QLNAME

2018/07/24,01:25:34,<qmgr>,001,001,Read, 14, 3832, 273, 0,QLNAME
I just saw this ...Will peek through it.

I saw a line in one of the reports that says MQBRO.. so I guess when there is a large queue with non indexed queues being browsed that can cause a problem.

From 2018/07/24,00:55:34.364296 to 2018/07/24,01:25:27.413922, duration 1793

MQOPENs 1046623, MQCLOSEs 646838, MQGETs 1512813, MQPUTs 189382

MQPUT1s 293776, MQINQs 195024, MQSETs 498, C ALL H 1

MQSUBs 0, MQSUBRQs 0, MQCBs 1435607

MQCTLs 1010535, MQSTATs 0, Publish 0

***

I wrote the reply in such a disorganized way because :
1 - I want a reply from a helping soul who has more energy than me right now;
2 - I don't have any more energy after spending my day eye balling MQ statistics like a proper lab worm;

Greetings
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Tue Jul 24, 2018 3:43 pm Post subject: Re: IBM MQ8.0 z/OS performance Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7602

GheorgheDragos wrote:
Would you have any recommendations for performance tuning?


Lyn Elkins has lots of valuable posts on this site and also on her blog:

http://www.lynsmq4zos.com/
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
gbaddeley
PostPosted: Tue Jul 24, 2018 5:02 pm Post subject: Reply with quote

Jedi

Joined: 25 Mar 2003
Posts: 2077
Location: Melbourne, Australia

Quote:
The problem might be that our app dev teams request always persistent queues which I think it's not because the messages are "critical", especially in TST GTU etc, but because they don't want to modify their apps.

The MQ code path and overhead of persistent vs non-persistent messages is quite different. It should be a design choice based on QOS that is consistently used in all environments through to prod.

BTW, Queues are NOT persistent. Each messages carries a MQMD property to indicate if it is persistent or not. This is regardless of the Queue's DEFPSIST attribute.
_________________
Glenn
Back to top
View user's profile Send private message
GheorgheDragos
PostPosted: Thu Aug 16, 2018 12:14 am Post subject: Reply with quote

Apprentice

Joined: 28 Jun 2018
Posts: 31

Colleagues,

First of all, thank you for taking your time to read and reply. Here is the current situation. We are using Omegamon for our installation. I have configured alerting so that we will know by mail when a buffer pool has less than 6% available buffer. This has happened this morning, for a good number of tens of minutes, so I had plenty of time to investigate which queues are the guilty ones. Still, with all the information from Omegamon, I have no idea what queues have been causing this. Help ? Buffer pool 3 - PS03.
Below the attachment from our system ( with blanked out queue names for obvious reasons ).

https://drive.google.com/open?id=1f53xpU15h9ozVWCeWR_BEcAfBgFLnDtC

Thank you in advance for your time and patience.

Dragos

***EDIT***

OR, it might be possible that there is no buffer pool activity, because it has finished at the time when I checked ( even though I checked pretty quickly ).. and there are just messages in the buffers, which have been offloaded because of a periodical automated system checkpoint ....

I'm pretty sure there is an easy answer to all of this..


Last edited by GheorgheDragos on Thu Aug 16, 2018 12:48 am; edited 1 time in total
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Thu Aug 16, 2018 12:47 am Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20220
Location: LI,NY

if you know the buffer pool you should also know which queues are backed by that buffer pool/ page set? Then see which of those queues had messages during the relevant interval...
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
GheorgheDragos
PostPosted: Thu Aug 16, 2018 1:02 am Post subject: Reply with quote

Apprentice

Joined: 28 Jun 2018
Posts: 31

[quote="fjb_saper"]if you know the buffer pool you should also know which queues are backed by that buffer pool/ page set? Then see which of those queues had messages during the relevant interval... [/quote]

But how can I know this if the messages have already been treated ? Omegamon has a short term memory of around 30 minutes-1 hour ...
I am in close collaboration with our automation guy. I want his rexx-es to trigger a series of displays whenever the buffer pool utilisation > 94% message prefix comes in the syslog, to display the queues which are in use, based on Buffer pool and PSID, dis ql(*) ps(...) then pull the queue locals, and issue DIS QS on those queue locals, to see they are doing IO ? before ALTER QMGR,MONQ(MEDIUM) and OFF after of course.

Ideas ?
Back to top
View user's profile Send private message
bruce2359
PostPosted: Thu Aug 16, 2018 3:48 am Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 8620
Location: US: west coast, almost. Otherwise, enroute.

GheorgheDragos wrote:

Below the attachment from our system ( with blanked out queue names for obvious reasons ).

https://drive.google.com/open?id=1f53xpU15h9ozVWCeWR_BEcAfBgFLnDtC

Thank you in advance for your time and patience..

To order to avoid the risk from computer viruses, I never click URLs from unknown sources. If you want me/us to look at your evidence, please post it here.
_________________
My life flows on in endless song;
How can I keep from singing?
Back to top
View user's profile Send private message
GheorgheDragos
PostPosted: Thu Aug 16, 2018 4:21 am Post subject: Reply with quote

Apprentice

Joined: 28 Jun 2018
Posts: 31

bruce2359 wrote:
GheorgheDragos wrote:

Below the attachment from our system ( with blanked out queue names for obvious reasons ).

https://drive.google.com/open?id=1f53xpU15h9ozVWCeWR_BEcAfBgFLnDtC

Thank you in advance for your time and patience..

To order to avoid the risk from computer viruses, I never click URLs from unknown sources. If you want me/us to look at your evidence, please post it here.


Please open it. It's text copy/pasted from Omegamon while the situation is active, in Citrix session it's uncomfortable to take/save screenshots. So I copied them in outlook to keep the formatting. Then I saved it in a HTML format. Because if I copy paste here, I can't choose the font. Here is a part of the HTML. See, not readable...
Code:

Command ==>                                                HostName : CPUF

KMQMSBMD                  Buffer Manager                   QmgrName : GM10

Ŀ

               Latest Buffer Manager SMF Sample Summary               _

Ĵ

# of Pools In Use.........        4 Low % Avail...............     15.0 

Low # Avail...............      964 Zero Bufrs Count..........        0 

Synch Writes..............        0 GetPg IO %................      0.0 

% GetPg Outside Pool......      0.0                                      



                             Buffer Pools                             _

Ŀ

Columns  2 to  7 of 19        Rows      1 to      4 of      4

Ĵ

Pool   Ӷ% of Bufrs Available Low #    Zero Bufrs Page Sets +Queue

ID     ӷAvailable  Buffers    Avail    Count      Assigned  Assig

Ĵ

   00          91.8        964       964           0          1     0

   01          53.6       5626      5626           0          2 1554

   02          93.8       9844      9844           0          4    60

   03          15.0       1576      1576           0          3    98



 

 

 

 

 

            Options Menu

 

Select an option and then press ENTER

 

   1. H Buffer Pool Statistics History

   2. P Page Sets in Buffer Pool

   3. R Recent Buffer Pool Statistics

   4. S Queues in Buffer Pool

 

 

 

 

 

KMQQUBPS               Queues in Buffer Pool               QmgrName : GM10

Ŀ

              Latest Sample for Queues in Buffer Pool 03              _

Ĵ

Columns  2 to  6 of 29        Rows      1 to     18 of     98

Ĵ

Queue              Ӷ% Full    Msgs Read Msgs Put  Total    +Las

Name                   ӷ          per Sec    per Sec    Opens       

Ĵ



   SYSTEM.CLUSTER.TRANS        0.0        0.0        0.0         0 n/



   SYSTEM.DEAD.LETTER.Q        0.0        0.0        0.0         1 n/
Back to top
View user's profile Send private message
elkinsc
PostPosted: Thu Oct 04, 2018 8:45 am Post subject: Sorry I tend just to look in the z/OS area Reply with quote

Centurion

Joined: 29 Dec 2004
Posts: 129
Location: Memphis

GheorgheDragos, did your situation get resolved?
Back to top
View user's profile Send private message
GheorgheDragos
PostPosted: Fri Mar 13, 2020 6:02 am Post subject: Reply with quote

Apprentice

Joined: 28 Jun 2018
Posts: 31

Dear MQ community, I come back to you with what I have done so far, maybe it will help some new MQ admin, and what I am writing is related to the post I opened above so I will not open another one. Here is what I Have been doing for the last week :

*enabled monq(low) on all our queue managers for a week;
*gathered system statistics with MP1B (buff,stg,lock,log,eoj;

Found some interesting things which can be improved. Firstly, as mentioned above, our buffers are paging like crazy. Therefore, I thought to put an effort to separate long lived message queues vs short ones (I decided , out of think air to be honest ) that 500 000 microseconds (0.5 seconds ) should be the delimiting line between long lived and short lived. So, just now I finished displaying and sorting queue status of local queues only ( since we run in a msg ). All clear. Pending for someone to make me a rexx to split me the output in two files ( long lived vs short lived ) based on their QTIME and then, man is going to start separating them. I know, rexx is a must, what to do. I am currently running another set of statistics targeted at the page set utilisation. And should be a good start. If your message stays for longer than 0.5 in a queue, it goes to a small BP so it can buffer in peace without affecting the fast ones.

Another activity I have described is that I would like to get rid of old, unused queues ( and their storage classes ) . For this, MONQ helped as well, as then , after a week of being active, we can have candidate queues for deletion ( ones that their LGETDATE and LPUTDATE is blank - if haven't been used for a week ). Waiting for another rexx to filter me these as well.

The problem appears when I look in the LOG output of MP1B... I can see that ours are waiting for buffer even when the page IO is low. Why is that and how can I improve it? Is log stripping necessary when we log so little data ?

Quote:
z/OS QM Date Time MB written MB/sec MB used Pages per I/O LLCheckpoints Wait for buffer
lpar. qmgr 04/03/2020 16:45:43 6576 3.7 2821 1.2 25 11
lpar. qmgr 04/03/2020 19:15:08 3810 2.1 1457 1.2 11 57


I can go one..
Our log load is currently set at 500 000, WRTHRSH at 15, OUTBUFF at 4000 and INBUFF at 60. Dual logging and BSDS of course. Can this happen because of long units of work ? uncommitted threads ? I am still trying to wrap my head around it.
Back to top
View user's profile Send private message
elkinsc
PostPosted: Fri Mar 13, 2020 8:20 am Post subject: Excellent progress Reply with quote

Centurion

Joined: 29 Dec 2004
Posts: 129
Location: Memphis

1) I cannot recall - what version of MQ are you running?
2) Please make sure you have the latest version of MP1B, the log manager data had some displacements change and some of the reports did not reflect those changes. Are you seeing the same values in the LOG report that you are in the LOGCSV report?
3) You might want to run CSQ4SMFD and look at the values of the QJSTWRW field.
4) How big are your log files? If they are very small, you could be switching a lot and that can slow a number of things down depending on how well your DASD is responding.
5) In the LOG report during the intervals in question, are there extremely long I/O durations (like over a second)?
One reason I use MQSMFCSV is because the LOGCSV report does notinclude all the interesting fields like the long I/O duration for the interval.
Good Luck!
Back to top
View user's profile Send private message
GheorgheDragos
PostPosted: Thu Mar 26, 2020 10:56 am Post subject: Progress so far Reply with quote

Apprentice

Joined: 28 Jun 2018
Posts: 31

Dear colleagues,

The last few days/weeks have been very busy and I am happy with what has been achieved so far. Attached, for those of you who are trying out new projects or experienced veterans, you may find an excel I have put together ( queue/qmgr names blanked out of course ) each queue manager having two tabs. Original and reworked. This is for the queue separation based on the 0.5 second mark mentioned above.
The situation gets a little bit more complicated. Lets say I chose BP1 as the candidate for fast short lived messages and 2/3(4 where applicable) for ones who can page in peace. Now I am no rudimentary, I understand that I have to run MP1B and check the PSIDQIO so I can see what are "heavy" queues with lots of messages so I can associate them to a large pageset (generally the 1st ones as seen in the excel ) but, and here is a big one, and the last few days have been so frustrating ( especially that I had an interview for MQ Admin on distributed, and, not being my platform, I did not raise myself to my expectations. I am certified though, simply to prove that I cam capable of understanding and assimilating knowledge other than z/OS ) because I AM UNABLE to list STGSUM and STGCSV fields with MP1B so I can see and understand how is MQ using it's storage over a period of hours/days, so that I know how to reassign the buffer pools for maximum performance. I simply do not understand why those fields are blanks. We are running at 9.1 in DEV and 8.0 in Prd ( to be changed this weekend to 9.1 ). I have run the "old" MP1B and the new one. These will simply will not display. Why ? We have tracing active ( 3 traces - CLASS 1 STAT and ACCTG and class 3 ACCTG in PRD ). Lord Jesus why won't this information display ? It shows me only STG but i cannot use this...
Will someone, who has been through this path, help me to understand why this field is not displayed ? I am running the full JCL from the document, I thought maybe there are fields dependent upon fields. Nope. So, without knowing how MQ uses its storage ( other than checking CSQY220I .. which is also a good ideea, XDC the whole output from SDSF , X ALL;F ALL XSQY220I and monitor the usage, I might just do this workaround ), I simply cannot modify the BP, or as I would like to ( BP1 FIXED4KB ) so that the queue manager willl not go SoS.
I want to thank the people who replied here, I am much grateful for your suggestions ( especially regarding the logging - to which I am going to get next, after I finish with removing the old queues, splitting them, and tuning the channels ).
Here is the link to the excel to my gdrive , please feel free to comment on whether there are things I can improve, or simply use this template should it fit your needs.
https://drive.google.com/open?id=1P1YKx3pY_t7XKgCXI61_v4KnC6TDZF8g

Dragos

quick edit :

In the excel, yellow highlight means fast queues to be moved to BP1 and brown highlight means slow queues to be moved away from BP1.

Edit nr 2 : of course the two columns with digits represent, from left to right : last msg QTIME and average QTIME in microseconds, data gathered over a period o a week ( from 6th to 13th March cy )

Edit nr 3 : of course I need to enable trace STAT CLASS 2 for subsystem storage... Please forgive my above rant. It's been a long day.
Back to top
View user's profile Send private message
elkinsc
PostPosted: Thu Mar 26, 2020 11:56 am Post subject: Use MQSMFCSV Reply with quote

Centurion

Joined: 29 Dec 2004
Posts: 129
Location: Memphis

1) I encourage you to open cases on any issues you may find. I know there have been some recent discoveries about missing or miss-calculated fields. The program has been described to me as unwieldy, as with many programs that have been repeatedly modified over many years it can difficult to maintain (in particular as it is not anyone's full time job).
2) One of the many reasons I use MQSMFCSV to evaluate the SMF data I receive from customers, is because MQSMFCSV has been very quick to be updated for new versions without breaking the formatting for older versions.
3) I generally still use the messages from the JES log (CSQY220I) to tell me about the storage usage in the queue manager because of numerous issues with formatting or getting the data in the past.
Back to top
View user's profile Send private message
Display posts from previous:
Post new topicReply to topic Page 1 of 1

MQSeries.net Forum IndexIBM MQ Performance MonitoringIBM MQ8.0 z/OS performance
Jump to:



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP


Theme by Dustin Baccetti
Powered by phpBB 2001, 2002 phpBB Group

Copyright MQSeries.net. All rights reserved.