ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » RESOLVED: Solaris 10 Zone Impact on MQ Performance

Post new topic  Reply to topic Goto page 1, 2, 3  Next
 RESOLVED: Solaris 10 Zone Impact on MQ Performance « View previous topic :: View next topic » 
Author Message
SAFraser
PostPosted: Thu Sep 09, 2010 7:03 am    Post subject: RESOLVED: Solaris 10 Zone Impact on MQ Performance Reply with quote

Shaman

Joined: 22 Oct 2003
Posts: 742
Location: Austin, Texas, USA

We are working on a performance issue.

Operating system: Solaris 10
WebsphereMQ: 7.0.1.3
System tuning as specified in IBM documents.

During application testing, we noticed a degraded dequeue rate. In addition to application testing, we were also running a utility to clear a queue using a destructive get utility (based upon amqsget). When we killed the utility, the dequeue rate improved but still was not what we expect.

So, we ran the following test:

MQ running on a Solaris 10 global zone
Sent and consumed 150,000 small test messages.
Enqueue rate: 42,857 avg
Dequeue rate: 42,857 avg

MQ running on a Solaris 10 sparse zone
Sent and consumed 150,000 small test messages.
Enqueue rate: 33,335 avg
Dequeue rate: 5,642 avg

System resources were virtually unaffected during either test. In other words, no appreciable change in CPU and memory usage were noted. Our initial troubleshooting suggests that the global zone / sparse zone variable is the significant one.

We are working with our Solaris admins, and perhaps a ticket to Sun will follow. We will probably open a PMR shortly; but, I'd be interested in thoughts from this forum. Thanks very much.


Last edited by SAFraser on Thu Sep 16, 2010 4:26 pm; edited 1 time in total
Back to top
View user's profile Send private message
mvic
PostPosted: Fri Sep 10, 2010 12:12 pm    Post subject: Re: Solaris 10 Zone Impact on MQ Performance Reply with quote

Jedi

Joined: 09 Mar 2004
Posts: 2080

SAFraser wrote:
Our initial troubleshooting suggests that the global zone / sparse zone variable is the significant one.

We are working with our Solaris admins, and perhaps a ticket to Sun will follow. We will probably open a PMR shortly; but, I'd be interested in thoughts from this forum. Thanks very much.

Is the OS/hardware delivering the same CPU and I/O resources to MQ in these 2 situations? Your OS support will be the best place to start. If (unlikely) they say MQ did something different in the 2 cases, then the job would be to work out why that was.
Back to top
View user's profile Send private message
SAFraser
PostPosted: Fri Sep 10, 2010 12:44 pm    Post subject: Reply with quote

Shaman

Joined: 22 Oct 2003
Posts: 742
Location: Austin, Texas, USA

Yes, the same system resources are available in the global and sparse zones. We removed all resource caps and performed the test in the global zone and the sparse zone on the same physical server. Get rates were 8000/minute on the global zones and 4500/minute on the sparse zone. Same volume of messages, same message content.

IBM has not been helpful, but I didn't expect it to be an MQ problem anyway. Our OS admins are planning a trouble ticket to Sun, I think.

I am hopeful that this can be corrected somehow. The alternative is to tear down all that we have built in sparse zones and rebuild them in the global zone. <sigh> I didn't want those sparse zones in the first place. MQ and WMB are not sparse zone friendly; there is little value in virtualization on Solaris unless whole root zones are used.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Fri Sep 10, 2010 6:51 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

SAFraser wrote:
Yes, the same system resources are available in the global and sparse zones. We removed all resource caps and performed the test in the global zone and the sparse zone on the same physical server. Get rates were 8000/minute on the global zones and 4500/minute on the sparse zone. Same volume of messages, same message content.

IBM has not been helpful, but I didn't expect it to be an MQ problem anyway. Our OS admins are planning a trouble ticket to Sun, I think.

I am hopeful that this can be corrected somehow. The alternative is to tear down all that we have built in sparse zones and rebuild them in the global zone. <sigh> I didn't want those sparse zones in the first place. MQ and WMB are not sparse zone friendly; there is little value in virtualization on Solaris unless whole root zones are used.


Could it be that the only difference between your sparse and global zone is the number of CPUs allocated/available to each? Could it be that in your case you have only about half the cpus allocated to the sparse zone MQ is running in and all the CPUs allocated to the global zone.

Other avenue of research would be the load manager.
Let's assume that the sparse zone has exactly the same resources as the global zone. If the workload manager gives priority to the processes in the global zone over those in the sparse zone, and there is activity in the global zone at the time of your test, I would expect that you see a performance slow down in the sparse zone. How much of a performance hit would of course depend on the level of activity in the global zone.

A bizarre test here would be to have the mq clients working in the global zone and the manager in the sparse zone. So the more you scale the clients the more you slow down the qmgr!!.

Think about the nightmare of setting priorities in zOS between the database, CICS and MQ. Well you just might run into the same type of problem with sparse vs global zones on Solaris...

And at this rate we haven't even talked about memory allocation across the zones...

Have fun
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
SAFraser
PostPosted: Sun Sep 12, 2010 9:46 am    Post subject: Reply with quote

Shaman

Joined: 22 Oct 2003
Posts: 742
Location: Austin, Texas, USA

As always, fjb, you've raised some good points.

We performed the first set of tests with resource allocations assigned to both the global and sparse zones. Thinking (as you did) that the resource caps were a variable that might impact performance, we removed the resource caps. There was a slight increase in sparse zone performance, but not nearly enough. (We removed caps for both CPUs and memory.)

We have engaged our data center services provider, and have asked them to open a ticket with Sun. Several of us agree with you -- that the issue is workload management. These are T series machines which are designed specifically to manage multi-threaded applications in a virtualized environment and is notoriously slow with single threaded work. (Heck no, I didn't pick these machines.)

Back to top
View user's profile Send private message
SAFraser
PostPosted: Thu Sep 16, 2010 6:37 am    Post subject: Reply with quote

Shaman

Joined: 22 Oct 2003
Posts: 742
Location: Austin, Texas, USA

Current status:

A global zone uses UFS filesystem.
A sparse zone uses ZFS filesystem.

"write" operation is the same speed on both types of zones.
"read" operation is five times slower on sparse compared to global zone.

PUT rates are about the same speed on both types of zones.
GET rates are dramatically slower on sparse compared to global zone.

Tell me what you know about GET..... does this anomaly make sense?

(Oh yes, we have high priority tickets open with both IBM and Sun. But I value your thoughts, too.)

Thanks.
Back to top
View user's profile Send private message
mvic
PostPosted: Thu Sep 16, 2010 6:52 am    Post subject: Reply with quote

Jedi

Joined: 09 Mar 2004
Posts: 2080

SAFraser wrote:
"read" operation is five times slower on sparse compared to global zone.

Why the large difference? Is it somewhere in the configuration? Or is this "to be expected" in your environment? I guess these are questions that your support tickets are there to answer

MQ can't go quicker if the I/O goes slower.
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Thu Sep 16, 2010 6:54 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

The puts are done under syncpoint, in batches, reducing how often MQ actually has disk I/O to just the frequency of the MQCMITs?

The gets are done one at a time, meaning MQ has disk I/O each time?
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
SAFraser
PostPosted: Thu Sep 16, 2010 6:57 am    Post subject: Reply with quote

Shaman

Joined: 22 Oct 2003
Posts: 742
Location: Austin, Texas, USA

mvic, My current question is "how much 'read' I/O is involved in a 'get' operation"? I'm not sure if the 'read' degradation is relevant or a red herring.

I think that's the right question. Which, yes, is being pursued with tech supports.
Back to top
View user's profile Send private message
SAFraser
PostPosted: Thu Sep 16, 2010 7:02 am    Post subject: Reply with quote

Shaman

Joined: 22 Oct 2003
Posts: 742
Location: Austin, Texas, USA

Peter, Good thought! But I don't think we do any darn thing under syncpoint at our site. You meant syncpoint by the application, right? Not the unders-the-covers stuff that MQ does to assure once-and-once-only delivery?
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Thu Sep 16, 2010 7:05 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

SAFraser wrote:
Peter, Good thought! But I don't think we do any darn thing under syncpoint at our site. You meant syncpoint by the application, right? Not the unders-the-covers stuff that MQ does to assure once-and-once-only delivery?


Correct.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
mvic
PostPosted: Thu Sep 16, 2010 7:08 am    Post subject: Reply with quote

Jedi

Joined: 09 Mar 2004
Posts: 2080

SAFraser wrote:
mvic, My current question is "how much 'read' I/O is involved in a 'get' operation"? I'm not sure if the 'read' degradation is relevant or a red herring.

Difficult to generalise, but if your queue is building up, and/or your message throughput is high, probably each MQGET results in some I/O. It it's persistent and/or in syncpoint, it'll result in I/O. If you're doing only non-persistent on an empty queue, then MQ tries (maybe even succeeds, if there is a waiting MQGET) to do no I/O at all.

For such a massive I/O performance difference, it has got to hurt. I take it from the way you presented the info that the difference is seen at an OS or driver level..? If so, there has got to be some OS or driver explanation. IMHO.
Back to top
View user's profile Send private message
SAFraser
PostPosted: Thu Sep 16, 2010 4:26 pm    Post subject: Reply with quote

Shaman

Joined: 22 Oct 2003
Posts: 742
Location: Austin, Texas, USA

Root cause has been determined.

The zfs filesystem for sparse zones does synchronous writes to disk. First data is written to buffer, then to disk. Control is not returned to the application until the write to disk is successful. In zfs, this introduces noticeable latency.

We turned off the synchronous write flag on the sparse zone and the performance issue was completely corrected.

Unfortunately, running without synchronous write protection is not feasible. If the server crashes with messages in the buffer, we believe that the queue manager would not come back up as its logs would not reflect correct information.

For the short term, we may try to move /var/mqm/* to SAN storage. Longer term, we will rebuild everything on global zones.

I am surprised no one has posted about this before. Surely we can't be the only site that has noticed this problem!
Back to top
View user's profile Send private message
mvic
PostPosted: Thu Sep 16, 2010 4:41 pm    Post subject: Reply with quote

Jedi

Joined: 09 Mar 2004
Posts: 2080

SAFraser wrote:
I am surprised no one has posted about this before. Surely we can't be the only site that has noticed this problem!

What was this Solaris setting you changed.. is there a web page you can give us a link to? Thanks
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Thu Sep 16, 2010 4:56 pm    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

SAFraser wrote:
The zfs filesystem for sparse zones does synchronous writes to disk. First data is written to buffer, then to disk. Control is not returned to the application until the write to disk is successful. In zfs, this introduces noticeable latency.


But PUTs were faster than GETs? I would think both would be equally affected.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Goto page 1, 2, 3  Next Page 1 of 3

MQSeries.net Forum Index » General IBM MQ Support » RESOLVED: Solaris 10 Zone Impact on MQ Performance
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.