ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » Internal error on Tandem: "transaction reference left&q

Post new topic  Reply to topic
 Internal error on Tandem: "transaction reference left&q « View previous topic :: View next topic » 
Author Message
Tibor
PostPosted: Tue Apr 20, 2004 2:57 am    Post subject: Internal error on Tandem: "transaction reference left&q Reply with quote

Grand Master

Joined: 20 May 2001
Posts: 1033
Location: Hungary

We've got a special internal error on Tandem production, this is a snippet from the FD file (repeated 25 times per minute). Connecting application collapsed with a ZZSA file but started without error after restart.

But I don't know where was coming this error:
- from transaction system (TM/MP) or
- from messaging system (MQ)?

Has anyone ever seen a similar error message on Tandem? ("Transaction (internal UOW) reference left when moving expired messages")

TIA,

Tibor

Code:
+-----------------------------------------------------------------------------+
|                                                                             |
| MQSeries First Failure Symptom Report                                       |
| =====================================                                       |
|                                                                             |
| Date/Time         :- April     13  16:5:56    2004                          |
| Host Name         :- \PING                                                  |
| PIDS              :- 5724A39                                                |
| LVLS              :- 510                                                    |
| Product Long Name :- MQSeries for Compaq NonStop Kernel                     |
| Vendor            :- IBM                                                    |
| Probe Id          :- QS034000                                               |
| Application Name  :- MQM                                                    |
| Component         :- qslDetectExpired                                       |
| Build Date        :- May  6 2003                                            |
| Exe File Name     :- \PING.$SQLCAT.ZMQSEXE.MQQSSVR                          |
| UserID            :- MQM.MANAGER                                            |
| Process File Name :- \PING.$MQSS:989374917                                  |
| Node number       :- 12                                                     |
| CPU               :- 0                                                      |
| PIN               :- 531                                                    |
| QueueManager      :- OSSP                                                   |
| Major Errorcode   :- xecF_E_UNEXPECTED_RC                                   |
| Minor Errorcode   :- OK                                                     |
| Probe Type        :- MSGAMQ6118                                             |
| Probe severity    :- Severity 2: error                                      |
| Probe Description :- AMQ6118: An internal MQSeries error has occurred (0)   |
| Text              :- Transaction (internal UOW) reference left when moving  |
|                      expired m                                              |
| Comment1          :- REKOD_OSS_ONL                                          |
|                                                                             |
|                                                                             |
+-----------------------------------------------------------------------------+
Back to top
View user's profile Send private message
mqonnet
PostPosted: Tue Apr 20, 2004 4:49 am    Post subject: Reply with quote

Grand Master

Joined: 18 Feb 2002
Posts: 1114
Location: Boston, Ma, Usa.

With almost all of the errors/ffsts relating to UOW, you MUST always check the EMS events/logs. Take a look at ems events around the time you started getting these errors and see if you have any correlation there.

Anything logged to mqerrlogs???

Are you doing something like putting messages in a UOW with expiry and not committing until say you are done with all the puts(may be 100's or 1000's)

Cheers
Kumar
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Tibor
PostPosted: Tue Apr 20, 2004 7:10 am    Post subject: Reply with quote

Grand Master

Joined: 20 May 2001
Posts: 1033
Location: Hungary

Hi Kumar,

Our MQ collector ($MQS) is rolled over that's why I can't retrieve the events.

MQERRLG2 entry seems normal:
Code:

 -------------------------------------------------------------------------------
 2004/04/13   16.05.21                                                         
 AMQ8639: A Queue Server processed expired messages.                           
                                                                               
 EXPLANATION:                                                                   
 The Queue Server process $MQSS detected and processed 14 messages that have   
 expired.                                                                       
 ACTION:                                                                       
 None normally necessary.  This message is logged to provide information about 
 the number of messages that expire for each Queue Server.  If performance     
 degradation is experienced for a particular Queue Server, verify that there are
 not an excessively large number of expired messages having to be processed by 
 that Queue Server process.                                                     


For your last question: this is an 'online' interface with shor timeout and message expiry, so MQSS is working hardly sometimes . But commited by messages...

Tibor
Back to top
View user's profile Send private message
mqonnet
PostPosted: Tue Apr 20, 2004 8:11 am    Post subject: Reply with quote

Grand Master

Joined: 18 Feb 2002
Posts: 1114
Location: Boston, Ma, Usa.

The problem is, without having any knowledge of whatever happened to the transactions it is hard to understand why we get these errors from QS process. And since your ems collector rolled over, it makes it even more difficult. Just out of curiosity though, what was going on that forced a rollover of ems logs. Because usually its pretty big and i have seen systems stressed out to max for days without having a rollover of ems logs even generating ffsts every now and then.

But here are a few pointers.

-Is the ffst coming out of the primary or the backup qs process. If it is the backup you should probably apply the latest maintenance, i believe is C2Efix4 on top of csd02 which is what you seem to have. It has quite a few fixes in that area.
-Is this a one time occurance or does it happen often. If it happens next time, may be you could try and get ems events around that time. Because that could be the key.
-If you go into your qm pathway and issue status server *, do you see any errors against any mq servers. If yes, whats the error code.
-Even though you explained what your app is doing, if you could explain the program logic in detail it may throw some more light. Such as PM/NPM, sync/no-sync, begin/end/aborttransaction within loop or outside loop etc.

Cheers
Kumar
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Tibor
PostPosted: Tue Apr 20, 2004 2:58 pm    Post subject: Reply with quote

Grand Master

Joined: 20 May 2001
Posts: 1033
Location: Hungary

Quote:
Just out of curiosity though, what was going on that forced a rollover of ems logs...


On all of our Tandem environments, MQ has an own collector named $MQS. But some mq driven interface programs are working into same collector with a lot of junk messages (i.e. not strictly system-specific).

Quote:
Is this a one time occurance or does it happen often.


Only one time since 1999 . (starting date for me & mq & tandem)

Quote:
Is the ffst coming out of the primary or the backup qs process


I don't tell you now, but tomorrow I'll write you again.
How can I download e-fixes? MQs are running on CSD02 on all Tandem but there isn't any new content on the 'official' site

Tibor
Back to top
View user's profile Send private message
mqonnet
PostPosted: Wed Apr 21, 2004 4:49 am    Post subject: Reply with quote

Grand Master

Joined: 18 Feb 2002
Posts: 1114
Location: Boston, Ma, Usa.

If you have your own collector, then you should have a mechanism to handle lots of erroroneous conditions and lots of events being generated henceforth. As in this case, if you were using the O/s supplied collectors, you would probably have been better off, as the logs would still go back as much as you need to. But may be your collector is not handling such cases properly, which i doubt. So, you might want to check it for future.

E-fixes are not put up on the support site, but they are official releases. Just that they dont make it up to the website as they tend to come out pretty often as opposed to Csd's which comes out only like once a year or so. Ask your ibm rep for downloading the latest efixes and they would guide you through the process.

Also post your app desing/algo which may make it easier to understand what could be the probable cause. Also try and answer all of my previous questions for a clearer picture.


By the way, did the system and QS recover after this failure occured. Since you say that you had gotten lots of FFSTs generated within a very short span.

Cheers
Kumar
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Tibor
PostPosted: Fri Apr 23, 2004 12:29 am    Post subject: Reply with quote

Grand Master

Joined: 20 May 2001
Posts: 1033
Location: Hungary

Quote:
-If you go into your qm pathway and issue status server *, do you see any errors against any mq servers. If yes, whats the error code.


There is no error code now, but there was a blackout in last saturday (electric power expansion & maintenance) and our prod Tandem was restarted. Sometimes we got "server restarted" messages by EC or REPSVR.

Code:
=status server *                                                               
                                                                               
SERVER           #RUNNING   ERROR  INFO                                         
MQS-CHANINIT00       1                                                         
MQS-CMDSERV00        1                                                         
MQS-EC00             1                                                         
MQS-EC01             1                                                         
MQS-EC02             1                                                         
MQS-EC03             1                                                         
MQS-EC04             1                                                         
MQS-EC05             1                                                         
MQS-EC06             1                                                         
MQS-EC07             1                                                         
MQS-EC08             1                                                         
MQS-EC09             1                                                         
MQS-EC10             1                                                         
MQS-EC11             1                                                         
MQS-EC12             1                                                         
MQS-EC13             1                                                         
MQS-EC14             1                                                         
MQS-EC15             1                                                         
MQS-ECBOSS           1                                                         
MQS-MQMSVR00         0                                                         
MQS-QMGRSVR00        1                                                         
MQS-QUEUE00          1                                                         
MQS-REPSVR00         1                                                         
MQS-REPSVR01         1                                                         
MQS-REPSVR02         1                                                         
MQS-REPSVR03         1                                                         
MQS-REPSVR04         1                                                         
MQS-REPSVR05         1                                                         
MQS-REPSVR06         1                                                         
MQS-REPSVR07         1                                                         
MQS-REPSVR08         1                                                         
MQS-REPSVR09         1                                                         
MQS-REPSVR10         1                                                         
MQS-REPSVR11         1                                                         
MQS-REPSVR12         1                                                         
MQS-REPSVR13         1                                                         
MQS-REPSVR14         1                                                         
MQS-REPSVR15         1                                                         
MQS-STATUS00         1                                                         
MQS-TCPLIS00         1                                                         
MQS-TCPLIS01         1                                                         
MQS-TCPLIS02         1                                                         
MQS-TRIGMON00        1                                                         
=                                                                               


But on April 1 there was a strange error with queue server:
Code:

 +-----------------------------------------------------------------------------+
 |                                                                             |
 | MQSeries First Failure Symptom Report                                       |
 | =====================================                                       |
 |                                                                             |
 | Date/Time         :- April     1  12:35:14    2004                          |
 | Host Name         :- \PING                                                  |
 | PIDS              :- 5724A39                                                |
 | LVLS              :- 510                                                    |
 | Product Long Name :- MQSeries for Compaq NonStop Kernel                     |
 | Vendor            :- IBM                                                    |
 | Probe Id          :- QS058015                                               |
 | Application Name  :- MQM                                                    |
 | Component         :- qslHandleGetCkp                                        |
 | Build Date        :- May  6 2003                                            |
 | Exe File Name     :- \PING.$SQLCAT.ZMQSEXE.MQQSSVR                          |
 | UserID            :- MQM.MANAGER                                            |
 | Process File Name :- \PING.$MQSS:989374917                                  |
 | Node number       :- 12                                                     |
 | CPU               :- 0                                                      |
 | PIN               :- 531                                                    |
 | QueueManager      :- OSSP                                                   |
 | Major Errorcode   :- xecF_E_UNEXPECTED_RC                                   |
 | Minor Errorcode   :- Unknown(1772)                                          |
 | Probe Type        :- MSGAMQ6118                                             |
 | Probe severity    :- Severity 2: error                                      |
 | Probe Description :- AMQ6118: An internal MQSeries error has occurred (1772 |
 |                      )?                                                     |
 | Text              :- Invalid Message Header context in Backup for Get       |
 |                                                                             |
 | Arith1            :- 6002  (0x1772)                                         |
 | Comment1          :- OSS_CTI1_IN                                            |
 |                                                                             |
 |                                                                             |
 +-----------------------------------------------------------------------------+
 +-----------------------------------------------------------------------------+
 |                                                                             |
 | MQSeries First Failure Symptom Report                                       |
 | =====================================                                       |
 |                                                                             |
 | Date/Time         :- April     1  12:35:18    2004                          |
 | Host Name         :- \PING                                                  |
 | PIDS              :- 5724A39                                                |
 | LVLS              :- 510                                                    |
 | Product Long Name :- MQSeries for Compaq NonStop Kernel                     |
 | Vendor            :- IBM                                                    |
 | Probe Id          :- NP008023                                               |
 | Application Name  :- MQM                                                    |
 | Component         :- nspBackup                                              |
 | Build Date        :- May  6 2003                                            |
 | Exe File Name     :- \PING.$SQLCAT.ZMQSEXE.MQQSSVR                          |
 | UserID            :- MQM.MANAGER                                            |
 | Process File Name :- \PING.$MQSS:989374917                                  |
 | Node number       :- 12                                                     |
 | CPU               :- 0                                                      |
 | PIN               :- 531                                                    |
 | QueueManager      :- OSSP                                                   |
 | Major Errorcode   :- xecF_E_UNEXPECTED_RC                                   |
 | Minor Errorcode   :- Unknown(1772)                                          |
 | Probe Type        :- MSGAMQ6118                                             |
 | Probe severity    :- Severity 2: error                                      |
 | Probe Description :- AMQ6118: An internal MQSeries error has occurred (1772 |
 |                      )?                                                     |
 | Text              :- Error handling checkpoint in backup                    |
 |                                                                             |
 | Arith1            :- 6002  (0x1772)                                         |
 |                                                                             |
 +-----------------------------------------------------------------------------+


At that very moment MQQSSVR process was spinning suddenly and priority went to 1. I stopped the primary process and since then it worked without problem for the next error referred in this topic.

Tibor
Back to top
View user's profile Send private message
mqonnet
PostPosted: Fri Apr 23, 2004 5:01 am    Post subject: Reply with quote

Grand Master

Joined: 18 Feb 2002
Posts: 1114
Location: Boston, Ma, Usa.

qslHandleGetCkp has been fixed in CSD02 but i do recollect that there was another fix to this same problem, which is the FFST from your latest post. Hence i would recommend that you apply the latest efix, c2efix4 and see if you still get the error.

But again, i dont think anybody could proceed any further unless we know the circumstance under which this happens and get enough logs(ems).

Cheers
Kumar
Back to top
View user's profile Send private message Send e-mail Visit poster's website
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » General IBM MQ Support » Internal error on Tandem: "transaction reference left&q
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.