MQSeries.net :: View topic - Slow MQWF Response

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » Workflow Engines - IBM MQ Workflow & Business Process Choreographer » Slow MQWF Response

Goto page 1, 2 Next

Slow MQWF Response

« View previous topic :: View next topic »

Author

Message

keithng

Posted: Tue Apr 27, 2004 10:33 am Post subject: Slow MQWF Response

Novice

Joined: 27 Apr 2004
Posts: 12

Dear all,

i am now using MQWF V3.4 SP4 on AIX. However, i am now having problem on produciton environment.

Every day, there will be several time slot that the mqwf seems response very slow, and even timeout. and from the fmcsys.log, i find a lot of messages like below,

04/27/04 09:27:27 FMC31065E The retry limit for message SDDS has exceeded. The message is stored in the execution server hold queue for later processing.
04/27/04 09:27:46 FMC31065E The retry limit for message SDDS has exceeded. The message is stored in the execution server hold queue for later processing.
04/27/04 09:28:05 FMC31065E The retry limit for message SDDS has exceeded. The message is stored in the execution server hold queue for later processing.
04/27/04 09:28:12 FMC31065E The retry limit for message SDDS has exceeded. The message is stored in the execution server hold queue for later processing.
04/27/04 09:28:29 FMC31065E The retry limit for message SDDS has exceeded. The message is stored in the execution server hold queue for later processing.
04/27/04 09:29:12 FMC31065E The retry limit for message SDDS has exceeded. The message is stored in the execution server hold queue for later processing.
04/27/04 09:29:44 FMC31065E The retry limit for message SDDS has exceeded. The message is stored in the execution server hold queue for later processing.
04/27/04 09:29:52 FMC31065E The retry limit for message SDDS has exceeded. The message is stored in the execution server hold queue for later processing.
04/27/04 09:30:21 FMC31065E The retry limit for message SDDS has exceeded. The message is stored in the execution server hold queue for later processing.
04/27/04 09:30:27 FMC31065E The retry limit for message SDDS has exceeded. The message is stored in the execution server hold queue for later processing.

does anyone has idea on what causes this kind of message ? I have done the reorg and rebind every nite, but the error still exists.

thanks

regards,
keith

Ratan

Posted: Tue Apr 27, 2004 10:47 am Post subject:

Grand Master

Joined: 18 Jul 2002
Posts: 1245

Is your cleanup server scheduled at the same time when you are experiencing this?
_________________
-Ratan

CHF

Posted: Tue Apr 27, 2004 12:19 pm Post subject:

Master

Joined: 16 Dec 2003
Posts: 297

This is from Manual WF Administration Guide

Quote:

Error events in the MQ Workflow system, such as database deadlocks, can
cause transactions to fail. MQ Workflow retries such transactions until the
customizable message retry limit is reached. When MQ Workflow is installed,
the default retry limit for processing messages is set to 5.
Once the retry limit is exceeded, failed messages sent to the execution server
from other servers or the program execution agent are saved in a specific
execution server hold queue for later processing.
Messages that exceed the retry limit indicate that the system needs
attendance. For messages that exceed the retry limit, an entry is recorded in
the system log. You should use the system log as your first source of
information when trying to resolve message failures. The system log contains
details about the failed message and normally indicates why it failed. After
solving the problem, you can then use functions provided within the
administration utility to restore the state of your processes.

That means WF tries every message for 5 times, if it couldn't process that message then that message will be put to Execution Server's Hold Queue. SDDS is the message format that WF uses to deal with API calls.

What might happened in your case is, WF tried to process some of your API calls and failed to process them in 5 times, after that it sends that message to Hold Queue.

Try this: Look up the messages in Hold Queue. Try to analyze what might have gone wrong with that message.

Hope this helps.

CHF

praveenchhangani

Posted: Tue Apr 27, 2004 12:36 pm Post subject:

Disciple

Joined: 20 Sep 2002
Posts: 192
Location: Chicago, IL

Keith,

With regards to the performance issues on your production server..., You've probably already looked at this, but just in case...take at look at the Best Practices Guide provided as a support pac from IBM - which includes practices that typically will avoid such situations. This is an excellent guide in my opinion.

Usually if you are seeing certain timeslots during the day in which such slow down's are seen, I would recommend the best practice guide and

http://www-306.ibm.com/software/integration/support/supportpacs/individual/wa0b.html

As Ratan said: Check your cleanup server settings to see whether or not completed process out there are getting cleaned up on a regular basis (off business hours, NOT during)

Things to ask yourself/team members:

1) Are you performing the DB2 runstats / rebind on a regular basis.
2) What are the number of execution servers you have running for this workflow configuration? Is that # in compliance with the #cpu's on your server?
3) What type of auditing are you having the system perform
- If FULL AUDITING is turned on are you writing to the DB or to MQ.
4) Are you using notifications? If so, are there a lot of notifications being generated etc.

5) In Workflow 3.4, are you making use of the FMCINTERNALOOP NOOP(no operation activity) in your models?

6) What is your IO Wait time on the server? How many configurations do you have? Are they all having the same performance issues?

7) How large is the client volume for the workflow queue manager? Are you making use of the performance pros of the client concentrator?

There could be a number of issues here with the performance.

Best thing to do is to start off by looking at the log files to figure out what some of the outstanding issues are and also talk with your DBA's to see if they can optimize the database.

fmcsys.log
fmcerr.log
db2diag.log
AMQERR1.log

Hope this helps!

Thanks,
Praveen
_________________
Praveen K. Chhangani,

IBM Certified Solutions Designer -
MQ Workflow 3.4.

keithng

Posted: Tue Apr 27, 2004 5:38 pm Post subject:

Novice

Joined: 27 Apr 2004
Posts: 12

Thanks for your information.

>> 1) Are you performing the DB2 runstats / rebind on a regular basis.
we have done it as a nitely basis

>> 2) What are the number of execution servers you have running for this workflow configuration? Is that # in compliance with the #cpu's on your server?
2, and i have an aix for 2 cpu

>> 3) What type of auditing are you having the system perform
db log, with filter of about 10 events.

>> 4) Are you using notifications? If so, are there a lot of notifications being generated etc.
we havent use any notification in this case.

>> 5) In Workflow 3.4, are you making use of the FMCINTERNALOOP NOOP(no operation activity) in your models?
we are using this in our flow already.

But the interesting thing is, though there are a lot of messages saying the message will be put into the hold queue, but when i look into it, there is only 1 or 2 message inside.....

Regards,
Keith

keithng

Posted: Wed Apr 28, 2004 4:25 am Post subject:

Novice

Joined: 27 Apr 2004
Posts: 12

I have checked with the hold queue, and found the following command when i use the fmcautil tool to browse the hold queue message

- System: FMCSYS
- Message: ChndTxnActImplComplete
- Component: ExeSvr
- Number of failed replays: 0
- Message content: -
ActImplCorrelID=OID(0000000101a140050000000000000000),59,OID(0000000101d2c00a000
0000000000000),59,OID(0000000102f840dc0000000000000000)

does any one has idea on how to analysis the above message ?

Thanks

Regards,
Keith

keithng

Posted: Wed Apr 28, 2004 7:14 pm Post subject:

Novice

Joined: 27 Apr 2004
Posts: 12

Dear all,

Just found a strange scenario that if there is deadlock happen on the queryworkitem, we will run into the message sdds retry limit exceed error....and it seems the wf server failed to response within that period....we are using wf3.4 with sp4...anyone has idea on that ?

thanks

Regards,
Keith

newbiedude

Posted: Thu Apr 29, 2004 8:11 am Post subject:

Voyager

Joined: 22 Dec 2002
Posts: 87

keithng,

did you recently add a new fix pac or make an upgrade changes. are these performance issues hapening all the time or during cetain peak times in the day. the message retry limit is set a 5 as being a default in workflow. do you have database on a separate disk or on the same server as MQWF. When did you first start seeing these production problems and where are you with them currently? (What is the status)
_________________
Newbiedude

praveenchhangani

Posted: Thu Apr 29, 2004 8:37 am Post subject:

Disciple

Joined: 20 Sep 2002
Posts: 192
Location: Chicago, IL

Quote:

Newbiedude(nathan)

When did you first start seeing these production problems and where are you with them currently? (What is the status)

It looks like you are pretty much 'Best Practices Guide' compliant. Nathan brings up a good point, as to what are perhaps some of the reasons or timelines as to when you first started seeing problems. Did the number of clients or users increase in the system and whether or not there were any changes made recently to either the DB layer or the MQWF layer? I guess I handn't realized until nathan's post that this is on a production environment.

Of course if you haven't already done so, please open up a PMR with IBM as well.

PS: Bear at the back of your head, that a few messages indicating database deadlocks are fine and quite normal in any workflow system. However if you start seeing a lot of these, then it's time to start being concerned.

As far as the following though:

Code:

System: FMCSYS
- Message: ChndTxnActImplComplete
- Component: ExeSvr
- Number of failed replays: 0
- Message content: -
ActImplCorrelID=OID(0000000101a140050000000000000000),59,OID(0000000101d2c00a000
0000000000000),59,OID(0000000102f840dc0000000000000000)

I am not sure, this could only be a snippet of the actual problem and is obviously not very meaningful. Unless, Jmac or someone else on this forum knows what the above snippet means, I would definitely go the route of the PMR as well especially since this is on a production system.

Sorry couldn't be of anymore help
_________________
Praveen K. Chhangani,

IBM Certified Solutions Designer -
MQ Workflow 3.4.

keithng

Posted: Thu Apr 29, 2004 9:13 am Post subject:

Novice

Joined: 27 Apr 2004
Posts: 12

thx for your reply.

However, the number of deadlocks in the environment is now about 5-8 a days. mostly related to the queryworkitems....and this is quite strange that queryworkitems will have this deadlock issue.....

moreover, the number of users now is only about 30, with concurrent # of users is about 10..which is quite a small figure....

praveenchhangani

Posted: Thu Apr 29, 2004 10:59 am Post subject:

Disciple

Joined: 20 Sep 2002
Posts: 192
Location: Chicago, IL

keithng wrote:

Keith

A database deadlock happens when 2 or more users trying to access the same resource (say table record). It happens when the users are waiting for each other to release other persons blocks resource. This usually has to do with the application, that why I mentioned that a few database deadlocks are ok and no something to get worried about even if you see them for a few days. So long as the message is retried and processed, you can ignore those "few" database deadlock issues.

Another thing in regards to your quest to getting better performance (I'm not sure if you have already done this) I would recommend is that you get in touch with your DBA's and have them analyze the database queries for some potential new indexes, as this would address performance implications.

Please keep us in the loop. Thanks.
_________________
Praveen K. Chhangani,

IBM Certified Solutions Designer -
MQ Workflow 3.4.

keithng

Posted: Thu Apr 29, 2004 5:08 pm Post subject:

Novice

Joined: 27 Apr 2004
Posts: 12

Thx a lot.

however, we found that when the db get deadlock, it seems workflow will stop response, and other users (not just 1) will need to wait....and from the log, it will come out a lot of retry message (not deadlock) and said it is failed and put to hold queue..... and every user will wait for another 3 min and then get a timeout message.....

actually, since the deadlock happens on the wf db, but not our own db. Therefore, we dun have any idea on adding indexes, especially, when the query workitems is supposed to be just a read only sql (which i get from the db snapshot) but having deadlock on it.

CHF

Posted: Thu May 13, 2004 9:24 am Post subject:

Master

Joined: 16 Dec 2003
Posts: 297

I am facing the same problem now, like keithng.
I have around 1000 instances running. I am unable to connect to WF. System log and error log indicates that there are Database Deadlocks.

1) Are you performing the DB2 runstats / rebind on a regular basis.
DB2 guys are saying they do not need to do any runstats / rebind , because we have only less tables in our DB

2) What are the number of execution servers you have running for this workflow configuration? Is that # in compliance with the #cpu's on your server?
I am running WF on OS/390. I have setup WLM. WLM brings up additional Execution Server address spaces automatically depending on the workload it received on WF. Presently WLM brought up 9 address spaces and each address space has 5 instances of execution servers (5 x 9 = 45). Currently 30 Execution Server Instances are active. The rest 15 have errored out due to deadlocks.

3) What type of auditing are you having the system perform
- If FULL AUDITING is turned on are you writing to the DB or to MQ.
None

4) Are you using notifications? If so, are there a lot of notifications being generated etc.
We are using notifications for all the activities. For each process instance at one time there will be only one notification. I have the notification time set to 15 min. And I assume there are same number of notifications as the process instaces (1000).

5) In Workflow 3.4, are you making use of the FMCINTERNALOOP NOOP(no operation activity) in your models?
I am using one FMCINTERNALOOP NOOP for this model

6) What is your IO Wait time on the server? How many configurations do you have? Are they all having the same performance issues?
I am not sure. I have only one configuration. We have encountering this problem in one of our environments. We have not attempted to duplicate on the other environments.

7) How large is the client volume for the workflow queue manager? Are you making use of the performance pros of the client concentrator?
We are not using any Client Concentrator.

Anybody has any ideas as to what could be the cause of our problem?

keithng:
What specific steps did you take to resolve your issue?

Thanks
CHF

kevinf2349

Posted: Thu May 13, 2004 10:22 am Post subject:

Grand Master

Joined: 28 Feb 2003
Posts: 1311
Location: USA

CHF

When we experienced a 'slowdown' on z/OS it was recommended to use to stop the cleanup server running through prime hours. This seemed to stop the slow downs and we now schedule the clean-up server to run overnight. I believe that on z/OS the cleanup server's own schedule doesn't have any effect so this had to be done by our automation package.

Hope this helps

CHF

Posted: Thu May 13, 2004 10:25 am Post subject:

Master

Joined: 16 Dec 2003
Posts: 297

Kevin,
Thanks for the reply.

Can you elaborate on this?

Quote:

I believe that on z/OS the cleanup server's own schedule doesn't have any effect so this had to be done by our automation package.

CHF

Display posts from previous:

Goto page 1, 2 Next

Page 1 of 2

MQSeries.net Forum Index » Workflow Engines - IBM MQ Workflow & Business Process Choreographer » Slow MQWF Response

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP