|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
z/OS WMB Performance Observations |
« View previous topic :: View next topic » |
Author |
Message
|
zpat |
Posted: Wed Sep 21, 2011 5:17 am Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5866 Location: UK
|
Presumably any use of the Z series Java speciality engine would make a big difference to price/performance. |
|
Back to top |
|
 |
lancelotlinc |
Posted: Wed Sep 21, 2011 5:59 am Post subject: |
|
|
 Jedi Knight
Joined: 22 Mar 2010 Posts: 4941 Location: Bloomington, IL USA
|
zpat wrote: |
Presumably any use of the Z series Java speciality engine would make a big difference to price/performance. |
If this were true, there would be some notation in the Installation Guide or in a tech note. Do you see something I missed? _________________ http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER |
|
Back to top |
|
 |
Vitor |
Posted: Wed Sep 21, 2011 7:00 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
lancelotlinc wrote: |
I have not seen in the WMB Installation Guide any mention of these criteria. What z/OS tuning is needful to achieve par on the performance with other platforms? |
It's in the z/OS documentation because it's the LPAR that's being tuned. Most z/OS shops treat LPARs as general workhorses, because no product consumes enough of the available horsepower to warrant such tuning.
As I said earlier, how many places buy z/OS or AIX just to run WMB, or have an LPAR tuned specifically for it? How many sites value the stability, control and auditability of z/OS and accept they're burning more CPU than they need, happy in the knowledge they've got plenty to spare?
The flip side of course is how many sites swim in the money they save buying Power7 not z/OS? How many accept reduced stability & control happy that it's still sufficient? How many set up specific Power7 LPARs for WMB, tuning as needed, to increase throughput and accept the additional maintenance & administration.
We can go round and round this as often as you like. One day your Power7 jihad may overtake the world, or we may be discussing the Next Big Architecture and how much better that is than z/OS.
Or we may be discussing how much better the new Power7 mainframes are & why WMB doesn't run on the 128-bit z/OS architecture it supports.
The only certain "winner" in this discussion is IBM. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
lancelotlinc |
Posted: Wed Sep 21, 2011 7:11 am Post subject: |
|
|
 Jedi Knight
Joined: 22 Mar 2010 Posts: 4941 Location: Bloomington, IL USA
|
Good points Sir Vitor. Excellent thought. Sipples implies that special tuning is required to get WMB to perform at par on z/OS when compared to other platforms. Certainly worth entertaining the idea. If it is possible to get improved performance from WMB on z/OS, I am asking what things to tune to achieve this.
More specifically, moving to a tactical discussion, I am interested to hear from sipples or others what things we can do to help z/OS achieve better performance using WMB.
What knobs can we tweak? Are these WMB-specific performance tuning knobs documented anywhere in WMB documentation?
I accept the idea that there are general LPAR tuning things to do. Perhaps we can get Tim Dunn to comment. Hints, tips, and tricks to improve (or workaround) the CPU usage time bug that is documented in the IBM performance report for WMB on z/OS. _________________ http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER |
|
Back to top |
|
 |
sipples |
Posted: Wed Sep 21, 2011 7:50 pm Post subject: |
|
|
Newbie
Joined: 21 Sep 2011 Posts: 7
|
lancelotlinc wrote: |
That being said, please address the problem of CPU time on z/OS for WMB transactions. Not only do the IBM performance reports indicate that there is a two-to-one difference between Power 7 and z/OS, but in my personal experience I also have actually observed this issue. |
Power 7 (actually POWER7) is a processor (family), with a certain reasonably narrow range of excellent CPU performance attributes per core. POWER7 processors can run a variety of operating systems, including AIX, Linux, and IBM i.
z/OS is an operating system which runs multiple concurrent workloads in varying Workload Manager service classes on dynamically managed flexible partitions that might have anywhere from 1 MIPS (or less) to tens of thousands of MIPS allocated. But even leaving all that aside, this month's z/OS release (1.13 as I write this) runs on processors introduced starting in the year 2000 (the z900/z800 family) all the way up to the z196. Looking at uniprocessor performance ratings, the fastest machine has cores that are rated over 46 times the performance of the slowest (2086-110 would be one example of the smallest mainframe CPU during that decade). POWER7 servers started shipping a decade later (2010) than the z900.
IBM says that their z196 mainframe, clocked at 5.2 GHz, contains the world's fastest cores, defined as single thread execution performance. (IBM is correct.) In other words, POWER7 and z196 are both market-leading, exceptionally high performance, state-of-the-art CPUs. More succinctly, they're both absolutely wonderful.
IBM mainframe processors also can run a variety of operating systems: z/OS, Linux, z/VSE, z/TPF, z/VM, and at least a couple others. With the zEnterprise BladeCenter Extension (zBX), the z196 (and z114) can also run AIX and more.
As someone mentioned upthread, apparently IBM published some WebSphere Message Broker for z/OS performance figures on z990 mainframes, introduced in 2003. And (apparently) here we are still talking about those performance data in the year 2011. Well, OK, but can anybody spot the problem?
The uniprocessor rating of a z990 is 413 PCIs. In contrast, the uniprocessor rating of a z196 (introduced in mid-2010, i.e. a contemporary of POWER7) is 1202 PCIs. Changing nothing else except the cores -- and that's not the only thing that has changed in 7+ years! -- I would expect a z990 performance report to indicate roughly 3X CPU core time for any given WMB activity compared to a z196.
Or, if you'd prefer to compare single whole machines maximally configured, a 2084-332 (z990) has a PCI rating of 9250. A 2817-780 (z196) has a PCI rating of 52286. Thus, ceteris paribus, and with a few other key assumptions (like having nicely SMP-spreadable workloads in both runs), I would expect a performance run on a maximally configured z990 to take roughly 5.6 times longer than it would on a maximally configured z196.
Said another way, if your z/OS observation wasn't on a z196 (a POWER7 contemporary), the only thing you've discovered is that newer hardware might perform better than older hardware. (Shocking, I know.) If, after looking at similar vintage cores, you still see some differences, there are at least a dozen other factors to look at -- and IBM might have some advice. Beyond even that, there are at least a dozen other considerations to take into account beyond performance and throughput when considering workload placement and associated comparative business cases. And, as I mentioned, it's very possible indeed that the best business case is to run the wonderful cross-platform middleware WebSphere Message Broker on a couple different platforms together, exactly as it's designed to do.
I've also seen many, many times (including with WMB z/OS) when an IBM client compares some business cases and then concludes, "Hey, IBM, you might have a pricing problem here in our situation." Sometimes that client is even correct. Then some negotiations begin. So if you've done that due diligence with some reasonable skill, share that with IBM and see what happens.
To repeat, maybe IBM shouldn't be hauling out now classic machine models and publishing performance data on them, regardless of how many disclaimers appear in the reports.
With respect to tuning (which I didn't mention), that's simple: I expect everybody, regardless of platform, would apply reasonable performance tuning skill and vendor support to their implementations. z/OS certainly has absolutely wonderful diagnostic functionalities to tell you exactly what's going on. |
|
Back to top |
|
 |
zpat |
Posted: Wed Sep 21, 2011 11:01 pm Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5866 Location: UK
|
lancelotlinc wrote: |
zpat wrote: |
Presumably any use of the Z series Java speciality engine would make a big difference to price/performance. |
If this were true, there would be some notation in the Installation Guide or in a tech note. Do you see something I missed? |
Software installation guides do not generally discuss hardware price/performance issues. Indeed software developers often have no idea of the price of their software, let alone price/performance of hardware!
Do you have a mainframe systems programming or capacity planning background? |
|
Back to top |
|
 |
lancelotlinc |
Posted: Thu Sep 22, 2011 4:44 am Post subject: |
|
|
 Jedi Knight
Joined: 22 Mar 2010 Posts: 4941 Location: Bloomington, IL USA
|
@sipples: very good points, and well taken. Specifically, when referring to the 2003 documentation, I was calling out the statement about z/OS CPU time bug which IBM has committed to fix. This CPU time bug interferes with WMB's ability to process messages, and as was pointed out in the document, would be fixed so that WMB could operate on the z/OS platform in a more efficient fashion. Hopefully someone from Hursley can update us on the status of the z/OS CPU time bug. I most certainly agree with you that we need fresher performance reports, with a more-apples-to-apples control of agnostic platform capability in the performance testing. I volunteer my services to help with such testing, if IBM is short on resources in this area. I have great respect for Tim Dunn and team et. al.
@zpat: my claim to fame is software development, architecture planning, and business management. These venues provide visibility and participation in capacity planning (ie. how can you implement a system that needs 10 million per time interval when the PC you put that software on is only capable of 5 million?) I may have not worded my response to your inquiry well: my intent was to say that if there were tuning steps to enable WMB to perform better, I would expect them to be in the Installation Guide. Thanks for asking about my background, which I gave briefly earlier in the thread. I started developing software on HP platforms in 1974. From 1983 to 1991, I was enlisted U. S. Air Force, visited RAF Mildenhall UK and Osan Air Base SOKOR. 1991 to 1994 Gateway 2000 MS Windows C++ developer. 1994 to 1995 BMC software, Patrol Recovery Manager, MQ. 1995 to 2000 Texas Dept Transportation TranStar Software Automation and Enron EnCube Residential Metering via 2-way pager network. 2000 to 2002 Vice President dot com. 2003 to 2008 State Farm Insurance WMB. 2008 to 2009 Monsanto and Express Scripts ESB. 2010 to 2011 MasterCard and Bank Of America WMB. _________________ http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER |
|
Back to top |
|
 |
Vitor |
Posted: Thu Sep 22, 2011 5:03 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
lancelotlinc wrote: |
visited RAF Mildenhall UK |
This explains everything. 24 hours in the Fens can be enough for the susceptible.
(They buried St Edmunds there you know)
 _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
lancelotlinc |
Posted: Thu Sep 22, 2011 5:17 am Post subject: |
|
|
 Jedi Knight
Joined: 22 Mar 2010 Posts: 4941 Location: Bloomington, IL USA
|
Vitor wrote: |
lancelotlinc wrote: |
visited RAF Mildenhall UK |
This explains everything. 24 hours in the Fens can be enough for the susceptible.
(They buried St Edmunds there you know)
 |
Well, now we know the source of our interactive discussions! Good to know these things. _________________ http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER |
|
Back to top |
|
 |
zpat |
Posted: Thu Sep 22, 2011 6:59 am Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5866 Location: UK
|
No amount of Unix experience will give you any real insight into the IBM mainframe world. It really is a very different architecture, especially at the operating system level. Tuning mainframe applications is an entire career and the fine grain controls are unmatched elsewhere. Unix is just starting to get features that mainframes had 20 years ago. |
|
Back to top |
|
 |
sipples |
Posted: Thu Sep 22, 2011 7:37 am Post subject: |
|
|
Newbie
Joined: 21 Sep 2011 Posts: 7
|
I don't know about Broker V6.0 for z/OS and its comment on page 49 of the 2006 report.... Wait, are we really talking about a couple lines in a January, 2006 performance report (run on early 2003 hardware) in late 2011? Seriously?
....Anyway, by the time IBM wrote up the 2008 V6.1 z/OS performance report that comment disappeared -- and so did the observation that triggered that comment, it seems. So if there was a problem -- and I'm not so sure (see below) -- it was resolved either prior to or with the initial V6.1 release for z/OS and thus not worth mentioning.
If you compare the "Scaling Message Throughput" discussions (starting on page 67 in the 2008 V6.1 AIX report, and page 58 of the 2008 V6.1 z/OS report), they say exactly the same thing because the observed pattern is exactly the same. Read carefully but, to summarize, no matter what platform you run on it appears that if you increase the number of message processing instances past about N-1, where N=the number of physical cores available for execution, you shouldn't see much if any benefit for CPU-bound processing. (Minimum 1 instance in a 1 core partition, of course.) Which makes perfect sense: if all your cores have tasks/threads to work on and can be kept busy (i.e. a CPU-bound scenario), adding more tasks/threads isn't going to improve throughput and might actually do a little harm. That's just common SMP sense, and it's nice to see that confirmed.
Now, let me point out that IBM mainframes offer an interesting option here if for some reason you want a lot of hardware threads but don't want to over-pay for them. On mainframes you have considerably broader discretion than on any other platform in selecting how many cores you want for the same (or within a whisker of the same) total price as the single core model. That's because you have lots of (optional) sub-capacity core speeds to choose from. For example, on a z114 model you could choose an A05 configuration -- that's 5 z/OS cores -- with a total system PCI (performance rating) of 104. (So each core offers roughly 21 PCIs -- each core is a wee lad. Each of those wee cores is roughly 1/37th the speed of the top Z01 uniprocessor configuration available on that machine, so there's a very big range of choices available.) Or you could choose a K01 configuration which would give you a single "taller" z/OS core with a PCI of 110. (The PCI is highly correlated with pricing.) Or you could choose something in between, like a 2-way F02 (PCI 107). And you can change your mind even after you get the hardware. For example, you could start with an F02 and then upgrade/crossgrade to the A05, paying only for 3 PCIs in that case.
In other words, if you've got workloads that can benefit from having lots of engines, you might lean more toward the slower but more numerous engine configurations. Or you might lean the other way. You choose the best fit for your workloads. On other systems you just get whatever full speed cores are on offer and that's that. And you pay for them all, including full core software licenses. Even if you've got some nice multiprocessing-friendly workloads that barely tickle each core individually but which could benefit more from more cores. (On the z196 you have up to 15 cores that can be ordered in sub-capacity configurations.)
Now, I'm not saying that's what you should do in a particular situation. Your mileage may vary. But it's an option, and many mainframe installations take full advantage of that option. One bank I'm familiar with decided their ideal configuration is to have 10 of the slowest engines (same machine, just the configuration setting), so that's what they run since that maximizes overall throughput for them with their particular applications. But they pay way, way less than what they'd pay for 10 full speed engines.
By the way, I notice the original poster didn't criticize the message rate drop-off observed when adding the 5th message processing instance in an I/O-bound scenario on AIX (2008 p. 67). Quick, sound the alarms! Must be a bug in Broker AIX! (Yes, I'm being facetious.) |
|
Back to top |
|
 |
lancelotlinc |
Posted: Thu Sep 22, 2011 8:11 am Post subject: |
|
|
 Jedi Knight
Joined: 22 Mar 2010 Posts: 4941 Location: Bloomington, IL USA
|
sipples wrote: |
I don't know about Broker V6.0 for z/OS and its comment on page 49 of the 2006 report.... Wait, are we really talking about a couple lines in a January, 2006 performance report (run on early 2003 hardware) in late 2011? Seriously? |
Its relevant today because the same performance bottlenecks occur on z/OS which do not appear on POWER7 (use case: credit card transactions).
I have observed these bottlenecks where the message size is relatively small (a few hundred bytes). With a few messages a second, both platforms perform well (latency ~250 milliseconds). As the TPS ramps past between thirty to around one hundred TPS, the latency explodes on z/OS whereas POWER7 latency creeps up, but stable at 10,000 TPS. I have seen latency go as high as 15 seconds with a steady measurement of 7 seconds on z/OS. The problem is not with Broker, the problem is with the CPU usage per message. z/OS doesn't seem to handle small message sizes well.
Maybe the CPU time bug has been solved; but I don't think the whole z/OS performance problem has gone away.
I intend to respond to your other comments, just wanted to explain why the focus is still here and relevant. _________________ http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER |
|
Back to top |
|
 |
lancelotlinc |
Posted: Thu Sep 22, 2011 8:14 am Post subject: |
|
|
 Jedi Knight
Joined: 22 Mar 2010 Posts: 4941 Location: Bloomington, IL USA
|
zpat wrote: |
No amount of Unix experience will give you any real insight into the IBM mainframe world. It really is a very different architecture, especially at the operating system level. Tuning mainframe applications is an entire career and the fine grain controls are unmatched elsewhere. Unix is just starting to get features that mainframes had 20 years ago. |
I advocate that POWER7 has arrived at the same or better performance and tuning capacity. On the other hand, I agree with your comment for other Unix platforms. _________________ http://leanpub.com/IIB_Tips_and_Tricks
Save $20: Coupon Code: MQSERIES_READER |
|
Back to top |
|
 |
zpat |
Posted: Thu Sep 22, 2011 8:24 am Post subject: |
|
|
 Jedi Council
Joined: 19 May 2001 Posts: 5866 Location: UK
|
The majority of IBM mainframe tuning is done inside z/OS. Although aspects of this have been added to Unix Hypervisors, the vast amount of z/OS code using to control performance (dating back to when every little MIP and byte really counted) has not been replicated in AIX. In z/OS you can control performance right down to the smallest unit of OS dispatching.
WLM in AIX offers a tiny subset of these features, trust me on that. But anyway it's a pointless argument since WMB is not really a native z/OS application (AFAIK it's a POSIX based port). If the WMB message flows are CPU inefficient it's generally cheaper to throw P-series boxes at it. But I would like to understand the effect of the Java engines in z/OS on this aspect. |
|
Back to top |
|
 |
sipples |
Posted: Thu Sep 22, 2011 8:29 am Post subject: |
|
|
Newbie
Joined: 21 Sep 2011 Posts: 7
|
lancelotlinc wrote: |
I have observed these bottlenecks where the message size is relatively small (a few hundred bytes). With a few messages a second, both platforms perform well (latency ~250 milliseconds). As the TPS ramps past between thirty to around one hundred TPS, the latency explodes on z/OS whereas POWER7 latency creeps up, but stable at 10,000 TPS. I have seen latency go as high as 15 seconds with a steady measurement of 7 seconds on z/OS. The problem is not with Broker, the problem is with the CPU usage per message. z/OS doesn't seem to handle small message sizes well. |
So stop guessing, open a PMR with IBM, follow their instructions about collecting relevant diagnostic information, work the problem, and get back to us with the results, OK?
While you're at it, see if you can find out why that particular I/O-bound run behaved that way in 2008. Maybe AIX has an I/O problem.  |
|
Back to top |
|
 |
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|