|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
 |
|
Internal error on Tandem: "transaction reference left&q |
« View previous topic :: View next topic » |
Author |
Message
|
Tibor |
Posted: Tue Apr 20, 2004 2:57 am Post subject: Internal error on Tandem: "transaction reference left&q |
|
|
 Grand Master
Joined: 20 May 2001 Posts: 1033 Location: Hungary
|
We've got a special internal error on Tandem production, this is a snippet from the FD file (repeated 25 times per minute). Connecting application collapsed with a ZZSA file but started without error after restart.
But I don't know where was coming this error:
- from transaction system (TM/MP) or
- from messaging system (MQ)?
Has anyone ever seen a similar error message on Tandem? ("Transaction (internal UOW) reference left when moving expired messages")
TIA,
Tibor
Code: |
+-----------------------------------------------------------------------------+
| |
| MQSeries First Failure Symptom Report |
| ===================================== |
| |
| Date/Time :- April 13 16:5:56 2004 |
| Host Name :- \PING |
| PIDS :- 5724A39 |
| LVLS :- 510 |
| Product Long Name :- MQSeries for Compaq NonStop Kernel |
| Vendor :- IBM |
| Probe Id :- QS034000 |
| Application Name :- MQM |
| Component :- qslDetectExpired |
| Build Date :- May 6 2003 |
| Exe File Name :- \PING.$SQLCAT.ZMQSEXE.MQQSSVR |
| UserID :- MQM.MANAGER |
| Process File Name :- \PING.$MQSS:989374917 |
| Node number :- 12 |
| CPU :- 0 |
| PIN :- 531 |
| QueueManager :- OSSP |
| Major Errorcode :- xecF_E_UNEXPECTED_RC |
| Minor Errorcode :- OK |
| Probe Type :- MSGAMQ6118 |
| Probe severity :- Severity 2: error |
| Probe Description :- AMQ6118: An internal MQSeries error has occurred (0) |
| Text :- Transaction (internal UOW) reference left when moving |
| expired m |
| Comment1 :- REKOD_OSS_ONL |
| |
| |
+-----------------------------------------------------------------------------+
|
|
|
Back to top |
|
 |
mqonnet |
Posted: Tue Apr 20, 2004 4:49 am Post subject: |
|
|
 Grand Master
Joined: 18 Feb 2002 Posts: 1114 Location: Boston, Ma, Usa.
|
With almost all of the errors/ffsts relating to UOW, you MUST always check the EMS events/logs. Take a look at ems events around the time you started getting these errors and see if you have any correlation there.
Anything logged to mqerrlogs???
Are you doing something like putting messages in a UOW with expiry and not committing until say you are done with all the puts(may be 100's or 1000's)
Cheers
Kumar |
|
Back to top |
|
 |
Tibor |
Posted: Tue Apr 20, 2004 7:10 am Post subject: |
|
|
 Grand Master
Joined: 20 May 2001 Posts: 1033 Location: Hungary
|
Hi Kumar,
Our MQ collector ($MQS) is rolled over that's why I can't retrieve the events.
MQERRLG2 entry seems normal:
Code: |
-------------------------------------------------------------------------------
2004/04/13 16.05.21
AMQ8639: A Queue Server processed expired messages.
EXPLANATION:
The Queue Server process $MQSS detected and processed 14 messages that have
expired.
ACTION:
None normally necessary. This message is logged to provide information about
the number of messages that expire for each Queue Server. If performance
degradation is experienced for a particular Queue Server, verify that there are
not an excessively large number of expired messages having to be processed by
that Queue Server process.
|
For your last question: this is an 'online' interface with shor timeout and message expiry, so MQSS is working hardly sometimes . But commited by messages...
Tibor |
|
Back to top |
|
 |
mqonnet |
Posted: Tue Apr 20, 2004 8:11 am Post subject: |
|
|
 Grand Master
Joined: 18 Feb 2002 Posts: 1114 Location: Boston, Ma, Usa.
|
The problem is, without having any knowledge of whatever happened to the transactions it is hard to understand why we get these errors from QS process. And since your ems collector rolled over, it makes it even more difficult. Just out of curiosity though, what was going on that forced a rollover of ems logs. Because usually its pretty big and i have seen systems stressed out to max for days without having a rollover of ems logs even generating ffsts every now and then.
But here are a few pointers.
-Is the ffst coming out of the primary or the backup qs process. If it is the backup you should probably apply the latest maintenance, i believe is C2Efix4 on top of csd02 which is what you seem to have. It has quite a few fixes in that area.
-Is this a one time occurance or does it happen often. If it happens next time, may be you could try and get ems events around that time. Because that could be the key.
-If you go into your qm pathway and issue status server *, do you see any errors against any mq servers. If yes, whats the error code.
-Even though you explained what your app is doing, if you could explain the program logic in detail it may throw some more light. Such as PM/NPM, sync/no-sync, begin/end/aborttransaction within loop or outside loop etc.
Cheers
Kumar |
|
Back to top |
|
 |
Tibor |
Posted: Tue Apr 20, 2004 2:58 pm Post subject: |
|
|
 Grand Master
Joined: 20 May 2001 Posts: 1033 Location: Hungary
|
Quote: |
Just out of curiosity though, what was going on that forced a rollover of ems logs... |
On all of our Tandem environments, MQ has an own collector named $MQS. But some mq driven interface programs are working into same collector with a lot of junk messages (i.e. not strictly system-specific).
Quote: |
Is this a one time occurance or does it happen often. |
Only one time since 1999 . (starting date for me & mq & tandem)
Quote: |
Is the ffst coming out of the primary or the backup qs process |
I don't tell you now, but tomorrow I'll write you again.
How can I download e-fixes? MQs are running on CSD02 on all Tandem but there isn't any new content on the 'official' site
Tibor |
|
Back to top |
|
 |
mqonnet |
Posted: Wed Apr 21, 2004 4:49 am Post subject: |
|
|
 Grand Master
Joined: 18 Feb 2002 Posts: 1114 Location: Boston, Ma, Usa.
|
If you have your own collector, then you should have a mechanism to handle lots of erroroneous conditions and lots of events being generated henceforth. As in this case, if you were using the O/s supplied collectors, you would probably have been better off, as the logs would still go back as much as you need to. But may be your collector is not handling such cases properly, which i doubt. So, you might want to check it for future.
E-fixes are not put up on the support site, but they are official releases. Just that they dont make it up to the website as they tend to come out pretty often as opposed to Csd's which comes out only like once a year or so. Ask your ibm rep for downloading the latest efixes and they would guide you through the process.
Also post your app desing/algo which may make it easier to understand what could be the probable cause. Also try and answer all of my previous questions for a clearer picture.
By the way, did the system and QS recover after this failure occured. Since you say that you had gotten lots of FFSTs generated within a very short span.
Cheers
Kumar |
|
Back to top |
|
 |
Tibor |
Posted: Fri Apr 23, 2004 12:29 am Post subject: |
|
|
 Grand Master
Joined: 20 May 2001 Posts: 1033 Location: Hungary
|
Quote: |
-If you go into your qm pathway and issue status server *, do you see any errors against any mq servers. If yes, whats the error code. |
There is no error code now, but there was a blackout in last saturday (electric power expansion & maintenance) and our prod Tandem was restarted. Sometimes we got "server restarted" messages by EC or REPSVR.
Code: |
=status server *
SERVER #RUNNING ERROR INFO
MQS-CHANINIT00 1
MQS-CMDSERV00 1
MQS-EC00 1
MQS-EC01 1
MQS-EC02 1
MQS-EC03 1
MQS-EC04 1
MQS-EC05 1
MQS-EC06 1
MQS-EC07 1
MQS-EC08 1
MQS-EC09 1
MQS-EC10 1
MQS-EC11 1
MQS-EC12 1
MQS-EC13 1
MQS-EC14 1
MQS-EC15 1
MQS-ECBOSS 1
MQS-MQMSVR00 0
MQS-QMGRSVR00 1
MQS-QUEUE00 1
MQS-REPSVR00 1
MQS-REPSVR01 1
MQS-REPSVR02 1
MQS-REPSVR03 1
MQS-REPSVR04 1
MQS-REPSVR05 1
MQS-REPSVR06 1
MQS-REPSVR07 1
MQS-REPSVR08 1
MQS-REPSVR09 1
MQS-REPSVR10 1
MQS-REPSVR11 1
MQS-REPSVR12 1
MQS-REPSVR13 1
MQS-REPSVR14 1
MQS-REPSVR15 1
MQS-STATUS00 1
MQS-TCPLIS00 1
MQS-TCPLIS01 1
MQS-TCPLIS02 1
MQS-TRIGMON00 1
=
|
But on April 1 there was a strange error with queue server:
Code: |
+-----------------------------------------------------------------------------+
| |
| MQSeries First Failure Symptom Report |
| ===================================== |
| |
| Date/Time :- April 1 12:35:14 2004 |
| Host Name :- \PING |
| PIDS :- 5724A39 |
| LVLS :- 510 |
| Product Long Name :- MQSeries for Compaq NonStop Kernel |
| Vendor :- IBM |
| Probe Id :- QS058015 |
| Application Name :- MQM |
| Component :- qslHandleGetCkp |
| Build Date :- May 6 2003 |
| Exe File Name :- \PING.$SQLCAT.ZMQSEXE.MQQSSVR |
| UserID :- MQM.MANAGER |
| Process File Name :- \PING.$MQSS:989374917 |
| Node number :- 12 |
| CPU :- 0 |
| PIN :- 531 |
| QueueManager :- OSSP |
| Major Errorcode :- xecF_E_UNEXPECTED_RC |
| Minor Errorcode :- Unknown(1772) |
| Probe Type :- MSGAMQ6118 |
| Probe severity :- Severity 2: error |
| Probe Description :- AMQ6118: An internal MQSeries error has occurred (1772 |
| )? |
| Text :- Invalid Message Header context in Backup for Get |
| |
| Arith1 :- 6002 (0x1772) |
| Comment1 :- OSS_CTI1_IN |
| |
| |
+-----------------------------------------------------------------------------+
+-----------------------------------------------------------------------------+
| |
| MQSeries First Failure Symptom Report |
| ===================================== |
| |
| Date/Time :- April 1 12:35:18 2004 |
| Host Name :- \PING |
| PIDS :- 5724A39 |
| LVLS :- 510 |
| Product Long Name :- MQSeries for Compaq NonStop Kernel |
| Vendor :- IBM |
| Probe Id :- NP008023 |
| Application Name :- MQM |
| Component :- nspBackup |
| Build Date :- May 6 2003 |
| Exe File Name :- \PING.$SQLCAT.ZMQSEXE.MQQSSVR |
| UserID :- MQM.MANAGER |
| Process File Name :- \PING.$MQSS:989374917 |
| Node number :- 12 |
| CPU :- 0 |
| PIN :- 531 |
| QueueManager :- OSSP |
| Major Errorcode :- xecF_E_UNEXPECTED_RC |
| Minor Errorcode :- Unknown(1772) |
| Probe Type :- MSGAMQ6118 |
| Probe severity :- Severity 2: error |
| Probe Description :- AMQ6118: An internal MQSeries error has occurred (1772 |
| )? |
| Text :- Error handling checkpoint in backup |
| |
| Arith1 :- 6002 (0x1772) |
| |
+-----------------------------------------------------------------------------+
|
At that very moment MQQSSVR process was spinning suddenly and priority went to 1. I stopped the primary process and since then it worked without problem for the next error referred in this topic.
Tibor |
|
Back to top |
|
 |
mqonnet |
Posted: Fri Apr 23, 2004 5:01 am Post subject: |
|
|
 Grand Master
Joined: 18 Feb 2002 Posts: 1114 Location: Boston, Ma, Usa.
|
qslHandleGetCkp has been fixed in CSD02 but i do recollect that there was another fix to this same problem, which is the FFST from your latest post. Hence i would recommend that you apply the latest efix, c2efix4 and see if you still get the error.
But again, i dont think anybody could proceed any further unless we know the circumstance under which this happens and get enough logs(ems).
Cheers
Kumar |
|
Back to top |
|
 |
|
|
 |
|
Page 1 of 1 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|