ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » Troubleshooting

Post new topic  Reply to topic
 Troubleshooting « View previous topic :: View next topic » 
Author Message
jason_e
PostPosted: Fri Sep 12, 2003 11:43 pm    Post subject: Troubleshooting Reply with quote

Apprentice

Joined: 19 Aug 2003
Posts: 33

Hi,
My WebSphere MQ 5.3 box has been running fine for a few days until for
some reason I was unable to receive any messages.

I need to determine why this happened and find out how to prevent it
from happening again. The event logs and FFSR reports is not of much
use since I don't really know what they are trying to tell me.

Below is a copy of my event logs and extracts from my FFST reports,
I have quite a few FFST reports but it seem to be the same type of
prodlem over and over again (xllLongLockRequest & cciTcpReceive).

What can I do to troubleshoot there errors?

The "Transactions rolled back to release log space." is very
concerning since AFAIK my applications shouldn't be causing
that to happen. How can I determine what transactions are
causing the problems?

Regards,
Jason

===========
NT EVENT LOGS
===========
Program cannot update queue manager object

The attempt to update object '%CHLBATCH.4'
on queue manager 'ZEPELTRA' failed with reason code 2003.

-----------------

Channel program ended abnormally.

Channel program 'SEPEL.ZEPELTRA' ended abnormally.

Look at previous error messages for channel
program 'SEPEL.ZEPELTRA' in the error files
to determine the cause of the failure.

------------------

Transactions rolled back to release log space.

The log space for the queue manager is becoming full. One or
more long-running transactions have been rolled back to release
log space so that the queue manager can continue to process requests.

Try to ensure that the duration of your transactions is not
excessive. Consider increasing the size of the log to
allow transactions to last longer before the log starts
to become full.

------------------
Error on receive from host xxx.xxx.xxx.xxx

An error occurred receiving data from xxx.xxx.xxx.xxx over TCP/IP. This
may be due to a communications failure.

The return code from the TCP/IP (recv)
call was 10054 (X'2746'). Record these values
and tell the systems administrator.

===========
FFSR REPORTS
===========

| WebSphere MQ First Failure Symptom Report |
| ========================================= | Date/Time :- |
| Host Name :- Windows 2000 Build 2195: Service Pack 4 |
| PIDS :- - |
| LVLS :- - |
| Product Long Name :- WebSphere MQ for Windows |
| Vendor :- IBM |
| Probe Id :- - |
| Application Name :- MQM |
| Component :- xllLongLockRequest |
| Build Date :- Oct 12 2002 |
| CMVC level :- p000-L021011 |
| Build Type :- IKAP - (Production) |
| UserID :- MUSR_MQADMIN |
| Process Name :- C:\Program Files\IBM\WebSphere MQ\bin\amqzlaa0.exe |
| Process :- 00001584 |
| Thread :- 00000002 |
| QueueManager :- ZEPELTRA |
| Major Errorcode :- STOP |
| Minor Errorcode :- OK |
| Probe Type :- HALT6109 |
| Probe Severity :- 1 |
| Probe Description :- AMQ6109: An internal WebSphere MQ error has occurred. |
| FDCSequenceNumber :- 0 |
| |
+-----------------------------------------------------------------------------+

+-----------------------------------------------------------------------------+
| |
| WebSphere MQ First Failure Symptom Report |
| ========================================= |
| |
| Date/Time :- |
| Host Name :- (Windows 2000 Build 2195: Service Pack 4) |
| PIDS :- |
| LVLS :- |
| Product Long Name :- WebSphere MQ for Windows |
| Vendor :- IBM |
| Probe Id :- |
| Application Name :- MQM |
| Component :- cciTcpReceive |
| Build Date :- Oct 12 2002 |
| CMVC level :- p000-L021011 |
| Build Type :- IKAP - (Production) |
| UserID :- MUSR_MQADMIN |
| Process Name :- C:\Program Files\IBM\WebSphere MQ\bin\AMQRMPPA.EXE |
| Process :- 00001628 |
| Thread :- 00000015 |
| Major Errorcode :- rrcE_BAD_DATA_RECEIVED |
| Minor Errorcode :- OK |
| Probe Type :- MSGAMQ9207 |
| Probe Severity :- 2 |
| Probe Description :- AMQ9207: The data received from host 'XXX.XXX.XXX.XXX' |
| is not valid. |
| FDCSequenceNumber :- 0 |
| Comment1 :- XXX.XXX.XXX.XXX |
| |
| Comment2 :- TCP/IP |
| |
| Comment3 :- |
| |
| |
+-----------------------------------------------------------------------------+
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Sat Sep 13, 2003 3:03 pm    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

Somewhere, you have an app that puts or gets using the syncpoint option. This app then never calls the commit call (an explicit commit) or gracefully ends (an implicit commit).

As a result, the queue manager must keep track of that uncommited message until one of the above commits takes place. If the commit never happens, and further work occurs on the QM, either by this app or any other app, the QM keeps adding it to its logs. Eventually it runs out of log space and the offending app is returned a 2003 RC on its next MQ call.

Then the process starts all over. You gotta find that offending app. Not easy. I know, I just dealt with this exact problem. IBM can parse the log files for you to help ID the queue the offending app is using. Once you know the queue, it is easier to know which app may be the problem.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
jason_e
PostPosted: Sun Sep 14, 2003 10:14 pm    Post subject: Reply with quote

Apprentice

Joined: 19 Aug 2003
Posts: 33

mmmh, I only have 4 program that use this queue at the moment and
the only option options that they use are below:

Program 1 & 2
-----------------
int openOptions = MQC.MQOO_INPUT_EXCLUSIVE | MQC.MQOO_BROWSE;

Program 3 & 4
-----------------
int openOptions = MQC.MQOO_OUTPUT;

Would these option cause problems? How can I perhaps change them?
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Mon Sep 15, 2003 5:18 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

These options would not cause this problem.

However, the syncpoint option is on the PUT, not the OPEN. Check those options please.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
pgorak
PostPosted: Thu Sep 18, 2003 3:27 am    Post subject: Reply with quote

Disciple

Joined: 15 Jul 2002
Posts: 158
Location: Cracow, Poland

There may be another reason than long-lasting transactions. If you are using circular logging, you may as well run into the following problem:

Suppose your applications make several PUTs, GETs and COMMITS. These operations take some log space - in the case of PUT operation, the amount of spece depends on the size of message. Now, as we know, during normal processing, the logger makes a checkpoint every 10000 operations logged. If 10000 of your operations take more than the total size of your log, your log files can never be released.

Try to calculate how many primary log files you need (remember that it's better to use smaller number of large files than a lot of small files). In order to do this, consider size of messages your applications put and number of PUT, GET and COMMIT operations that they perform.

Piotr
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Thu Sep 18, 2003 4:46 am    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

Piotr, the following is from the System Admin Guide:

Quote:

WebSphere MQ generates checkpoints automatically. They are taken when the queue manager starts, at shutdown, when logging space is running low, and after every 10 000 operations logged.


To me this indicates that MQ can recognize that log space is running low, and if it needs a checkpoint it will take one if it can, and not force itself to wait to 10000.

Regardless, your point on adequate logs is important. The default values are much to small. Todays computers have huge hard drives on even the cheapest machines, and if you are going to be doing any testing, there is no reason not to make the log files much much bigger.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » General IBM MQ Support » Troubleshooting
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.