|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
 |
|
Troubleshooting |
« View previous topic :: View next topic » |
Author |
Message
|
jason_e |
Posted: Fri Sep 12, 2003 11:43 pm Post subject: Troubleshooting |
|
|
Apprentice
Joined: 19 Aug 2003 Posts: 33
|
Hi,
My WebSphere MQ 5.3 box has been running fine for a few days until for
some reason I was unable to receive any messages.
I need to determine why this happened and find out how to prevent it
from happening again. The event logs and FFSR reports is not of much
use since I don't really know what they are trying to tell me.
Below is a copy of my event logs and extracts from my FFST reports,
I have quite a few FFST reports but it seem to be the same type of
prodlem over and over again (xllLongLockRequest & cciTcpReceive).
What can I do to troubleshoot there errors?
The "Transactions rolled back to release log space." is very
concerning since AFAIK my applications shouldn't be causing
that to happen. How can I determine what transactions are
causing the problems?
Regards,
Jason
===========
NT EVENT LOGS
===========
Program cannot update queue manager object
The attempt to update object '%CHLBATCH.4'
on queue manager 'ZEPELTRA' failed with reason code 2003.
-----------------
Channel program ended abnormally.
Channel program 'SEPEL.ZEPELTRA' ended abnormally.
Look at previous error messages for channel
program 'SEPEL.ZEPELTRA' in the error files
to determine the cause of the failure.
------------------
Transactions rolled back to release log space.
The log space for the queue manager is becoming full. One or
more long-running transactions have been rolled back to release
log space so that the queue manager can continue to process requests.
Try to ensure that the duration of your transactions is not
excessive. Consider increasing the size of the log to
allow transactions to last longer before the log starts
to become full.
------------------
Error on receive from host xxx.xxx.xxx.xxx
An error occurred receiving data from xxx.xxx.xxx.xxx over TCP/IP. This
may be due to a communications failure.
The return code from the TCP/IP (recv)
call was 10054 (X'2746'). Record these values
and tell the systems administrator.
===========
FFSR REPORTS
===========
| WebSphere MQ First Failure Symptom Report |
| ========================================= | Date/Time :- |
| Host Name :- Windows 2000 Build 2195: Service Pack 4 |
| PIDS :- - |
| LVLS :- - |
| Product Long Name :- WebSphere MQ for Windows |
| Vendor :- IBM |
| Probe Id :- - |
| Application Name :- MQM |
| Component :- xllLongLockRequest |
| Build Date :- Oct 12 2002 |
| CMVC level :- p000-L021011 |
| Build Type :- IKAP - (Production) |
| UserID :- MUSR_MQADMIN |
| Process Name :- C:\Program Files\IBM\WebSphere MQ\bin\amqzlaa0.exe |
| Process :- 00001584 |
| Thread :- 00000002 |
| QueueManager :- ZEPELTRA |
| Major Errorcode :- STOP |
| Minor Errorcode :- OK |
| Probe Type :- HALT6109 |
| Probe Severity :- 1 |
| Probe Description :- AMQ6109: An internal WebSphere MQ error has occurred. |
| FDCSequenceNumber :- 0 |
| |
+-----------------------------------------------------------------------------+
+-----------------------------------------------------------------------------+
| |
| WebSphere MQ First Failure Symptom Report |
| ========================================= |
| |
| Date/Time :- |
| Host Name :- (Windows 2000 Build 2195: Service Pack 4) |
| PIDS :- |
| LVLS :- |
| Product Long Name :- WebSphere MQ for Windows |
| Vendor :- IBM |
| Probe Id :- |
| Application Name :- MQM |
| Component :- cciTcpReceive |
| Build Date :- Oct 12 2002 |
| CMVC level :- p000-L021011 |
| Build Type :- IKAP - (Production) |
| UserID :- MUSR_MQADMIN |
| Process Name :- C:\Program Files\IBM\WebSphere MQ\bin\AMQRMPPA.EXE |
| Process :- 00001628 |
| Thread :- 00000015 |
| Major Errorcode :- rrcE_BAD_DATA_RECEIVED |
| Minor Errorcode :- OK |
| Probe Type :- MSGAMQ9207 |
| Probe Severity :- 2 |
| Probe Description :- AMQ9207: The data received from host 'XXX.XXX.XXX.XXX' |
| is not valid. |
| FDCSequenceNumber :- 0 |
| Comment1 :- XXX.XXX.XXX.XXX |
| |
| Comment2 :- TCP/IP |
| |
| Comment3 :- |
| |
| |
+-----------------------------------------------------------------------------+ |
|
Back to top |
|
 |
PeterPotkay |
Posted: Sat Sep 13, 2003 3:03 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Somewhere, you have an app that puts or gets using the syncpoint option. This app then never calls the commit call (an explicit commit) or gracefully ends (an implicit commit).
As a result, the queue manager must keep track of that uncommited message until one of the above commits takes place. If the commit never happens, and further work occurs on the QM, either by this app or any other app, the QM keeps adding it to its logs. Eventually it runs out of log space and the offending app is returned a 2003 RC on its next MQ call.
Then the process starts all over. You gotta find that offending app. Not easy. I know, I just dealt with this exact problem. IBM can parse the log files for you to help ID the queue the offending app is using. Once you know the queue, it is easier to know which app may be the problem. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
jason_e |
Posted: Sun Sep 14, 2003 10:14 pm Post subject: |
|
|
Apprentice
Joined: 19 Aug 2003 Posts: 33
|
mmmh, I only have 4 program that use this queue at the moment and
the only option options that they use are below:
Program 1 & 2
-----------------
int openOptions = MQC.MQOO_INPUT_EXCLUSIVE | MQC.MQOO_BROWSE;
Program 3 & 4
-----------------
int openOptions = MQC.MQOO_OUTPUT;
Would these option cause problems? How can I perhaps change them? |
|
Back to top |
|
 |
PeterPotkay |
Posted: Mon Sep 15, 2003 5:18 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
These options would not cause this problem.
However, the syncpoint option is on the PUT, not the OPEN. Check those options please. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
pgorak |
Posted: Thu Sep 18, 2003 3:27 am Post subject: |
|
|
 Disciple
Joined: 15 Jul 2002 Posts: 158 Location: Cracow, Poland
|
There may be another reason than long-lasting transactions. If you are using circular logging, you may as well run into the following problem:
Suppose your applications make several PUTs, GETs and COMMITS. These operations take some log space - in the case of PUT operation, the amount of spece depends on the size of message. Now, as we know, during normal processing, the logger makes a checkpoint every 10000 operations logged. If 10000 of your operations take more than the total size of your log, your log files can never be released.
Try to calculate how many primary log files you need (remember that it's better to use smaller number of large files than a lot of small files). In order to do this, consider size of messages your applications put and number of PUT, GET and COMMIT operations that they perform.
Piotr |
|
Back to top |
|
 |
PeterPotkay |
Posted: Thu Sep 18, 2003 4:46 am Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Piotr, the following is from the System Admin Guide:
Quote: |
WebSphere MQ generates checkpoints automatically. They are taken when the queue manager starts, at shutdown, when logging space is running low, and after every 10 000 operations logged.
|
To me this indicates that MQ can recognize that log space is running low, and if it needs a checkpoint it will take one if it can, and not force itself to wait to 10000.
Regardless, your point on adequate logs is important. The default values are much to small. Todays computers have huge hard drives on even the cheapest machines, and if you are going to be doing any testing, there is no reason not to make the log files much much bigger. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
|
|
 |
|
Page 1 of 1 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|