ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » How to diagnose XAER_RMFAIL in xa_recover step?

Post new topic  Reply to topic
 How to diagnose XAER_RMFAIL in xa_recover step? « View previous topic :: View next topic » 
Author Message
scott9
PostPosted: Mon Mar 07, 2011 10:29 pm    Post subject: How to diagnose XAER_RMFAIL in xa_recover step? Reply with quote

Acolyte

Joined: 11 Jul 2002
Posts: 62
Location: Sacramento,CA

I'm requesting someone to suggest additional tracing or investigation that I may have missed on the problem below.

MQ ENVIRONMENT:
AIX 6.1
MQ 7.0.1.3

DB ENVIRONMENT:
AIX 5.3
Oracle 11.1

DETAILS:
We have successfully connected to multiple databases with this qmgr using XA coordination; however, we are having trouble with one Oracle database. It fails in the xa_recover step (xa_open completes successfully). NO FDCs are created. There is only the AMQ7604 message:
"AMQ7604: The XA resource manager '<resourcenameremoved>' was not available when called for xa_recover."

We opened a PMR for this, but they said it's either a poorly formatted XAOpen string or a problem with the database. Our DBAs assure us the DB is fine and our XAOpen string is correct.

I suspect the issue is indeed in the database, but I want to produce evidence to back that up. I ran truss on strmqm, but didn't gleen anything from the output. I also ran an MQ trace (excerpt of the strmqtrc o/p below). Does anybody have advice on additional tracing to run?

This was my truss cmd:
truss -aef -rall -wall -o /tmp/truss.strmqm strmqm TESTQM

Note: I replaced specific DB names with "DBNAME" to protect our privacy. Also, the format of XAOpenString gets hacked up by the html converter. Trust me that the syntax is correct (or I would be getting a completely different error ).

XAResourceManager:
Name=<resourcenameremoved>
SwitchFile=UKor8dtc23.so
XAOpenString=ORACLE_XA+SQLNET=DBNAMEI1+HostName=host.this.com+PortNumber=1521+Sid=DBNAMEI1+ACC=P/user/pwd+sestm=100+threads=TRUE+DataSource=DB
NAME+DB=DBNAME+K=2+
XACloseString=
ThreadOfControl=THREAD

MQ TRACE:
21:45:07.067924 2949188.1 : Calling RM (DataDirect Oracle Server) using 64-bit XA
21:45:07.069087 2949188.1 : __________
21:45:07.069095 2949188.1 : xa_open <<
21:45:07.069102 2949188.1 : Xa_info : Input Parm
21:45:07.069109 2949188.1 : Rmid : Input Parm
21:45:07.069115 2949188.1 : Flags : Input Parm
21:45:07.069121 2949188.1 : Return value:
21:45:07.069127 2949188.1 : 0x0000: 00000000 |.... |
21:45:07.069135 2949188.1 : rc: XA_OK
21:45:07.069141 2949188.1 : --------} tmiCallXAOpen rc=OK
21:45:07.069147 2949188.1 : -------} tmiXAOpen rc=OK
21:45:07.069153 2949188.1 : ------} tmiOpenResourceMgr rc=OK
21:45:07.069159 2949188.1 : ------{ tmiRecoverResourceMgr
21:45:07.069165 2949188.1 : -------{ tmiAssumeNotIndoubt
21:45:07.069171 2949188.1 : -------} tmiAssumeNotIndoubt rc=OK
21:45:07.069178 2949188.1 : -------{ tmiXARecover
21:45:07.069184 2949188.1 : --------{ tmiCallXARecover
21:45:07.069191 2949188.1 : __________
21:45:07.069197 2949188.1 : xa_recover >>
21:45:07.069203 2949188.1 : Xids : Output Parm
21:45:07.069209 2949188.1 : Count:
21:45:07.069215 2949188.1 : 0x0000: 00000005 |.... |
21:45:07.069221 2949188.1 : Rmid:
21:45:07.069227 2949188.1 : 0x0000: 00000001 |.... |
21:45:07.069233 2949188.1 : Flags:
21:45:07.069239 2949188.1 : 0x0000: 01000000 |.... |
21:45:07.069247 2949188.1 : Calling RM (DataDirect Oracle Server) using 64-bit XA
21:45:07.112846 2949188.1 : __________
21:45:07.112860 2949188.1 : xa_recover <<
21:45:07.112866 2949188.1 : Xids:
21:45:07.112872 2949188.1 : None
21:45:07.112878 2949188.1 : Count : Input Parm
21:45:07.112884 2949188.1 : Rmid : Input Parm
21:45:07.112890 2949188.1 : Flags : Input Parm
21:45:07.112896 2949188.1 : Return value:
21:45:07.112903 2949188.1 : 0x0000: fffffff9 |.... |
21:45:07.112912 2949188.1 : rc: XAER_RMFAIL
21:45:07.112919 2949188.1 : --------}! tmiCallXARecover rc=Unknown(FFFFFFF9)

All advice is welcome! Thanks in advance...
Back to top
View user's profile Send private message
mqjeff
PostPosted: Tue Mar 08, 2011 2:18 am    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

If you're using DataDirect drivers, are you also using Broker?

If so, try adding in an ODBC trace and a simultaneous network level trace (wireshark or etc.).

And see if the DBAs can enable some tracing of the connection on their side - even if "The DB is fine".
Back to top
View user's profile Send private message
mvic
PostPosted: Tue Mar 08, 2011 1:18 pm    Post subject: Re: How to diagnose XAER_RMFAIL in xa_recover step? Reply with quote

Jedi

Joined: 09 Mar 2004
Posts: 2080

scott9 wrote:
We opened a PMR for this, but they said it's either a poorly formatted XAOpen string or a problem with the database. Our DBAs assure us the DB is fine and our XAOpen string is correct.

It won't be a bad xa_open string, because your xa_open call succeeded.

The XAER_RMFAIL from xa_recover is the error to focus on.

First find out if it's coming from the RDBMS client, or the ODBC driver.

If you can't find out from reasonable efforts, reopen that PMR (if you closed it) and get someone to to tell you how to find out.

If it's the RDBMS, speak to the db people. If it's the ODBC driver, speak to IBM.

IMHO
Back to top
View user's profile Send private message
scott9
PostPosted: Tue Mar 08, 2011 6:01 pm    Post subject: Resolved! You won't believe how... Reply with quote

Acolyte

Joined: 11 Jul 2002
Posts: 62
Location: Sacramento,CA

I resolved this issue by resetting the pwd in the database to the exact same pwd it was previously. I did NOT change the qm.ini. Just to be thorough....I did verify that the pwd was correct by logging into the db through sqlplus on a different system with Oracle client when this was failing. The same exact pwd worked interactively, but didn't work through the data direct drivers during the xa_recover operation. Simply changing the pwd in the database to the exact same thing it was before fixed it. Two distinct databases shared this same error and the fix was the same for both. I hope this thread helps someone else who finds themselves in this situation.

If anybody can suggest a root cause for this failure, I would be interested to hear your thoughts...perhaps some corruption in the db pwd key that didn't get picked up by an interactive session?? Caching??
Back to top
View user's profile Send private message
exerk
PostPosted: Wed Mar 09, 2011 12:57 am    Post subject: Reply with quote

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

scott9, please update your original post and pre-append RESOLVED to it as it makes it easier for people when they search to see whether a problem close, or the same, as their own has a fix. Thank you.
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.
Back to top
View user's profile Send private message
mvic
PostPosted: Wed Mar 09, 2011 2:01 am    Post subject: Re: Resolved! You won't believe how... Reply with quote

Jedi

Joined: 09 Mar 2004
Posts: 2080

This does not make sense to me.

Did you change it to the correct password?

From the MQ trace, it was correct, because your xa_open worked.

Take another MQ trace, is the correct password flowing into the xa_open call?

Is this some clever sort of RDBMS arrangement where your client will silently try to connect to different servers depending on X or Y or Z ?
Back to top
View user's profile Send private message
mvic
PostPosted: Wed Mar 09, 2011 2:02 am    Post subject: Reply with quote

Jedi

Joined: 09 Mar 2004
Posts: 2080

exerk wrote:
scott9, please update your original post and pre-append RESOLVED

From my reading, this problem has gone away without the OP really understanding why, I don't think it is fair to say it is resolved.
Back to top
View user's profile Send private message
exerk
PostPosted: Wed Mar 09, 2011 2:05 am    Post subject: Reply with quote

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

mvic wrote:
exerk wrote:
scott9, please update your original post and pre-append RESOLVED

From my reading, this problem has gone away without the OP really understanding why, I don't think it is fair to say it is resolved.


Resolvedish?
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.
Back to top
View user's profile Send private message
scott9
PostPosted: Wed Mar 09, 2011 11:04 am    Post subject: Clarification on some points Reply with quote

Acolyte

Joined: 11 Jul 2002
Posts: 62
Location: Sacramento,CA

Some clarification points:
* I confirmed the password by connecting to the database interactively using sqlplus. Two of us did this independently to confirm no typos. We did this multiple times and we use this same pwd everywhere, so not likely we typo'd.

* I did NOT change qm.ini, so no changes to the XAOpenString. I fixed this only by changing the pwd at the database (to the same thing).

* xa_open does NOT require authentication at the database. xa_open WILL work fine if you have an incorrect pwd. I verified with trace (strmqtrc) that the xa_open routine ended successfully, even when I purposefully typo'd the pwd. Only xa_recover actually authenticates.

* I made the change at the database. The DBA indicated that although I changed the pwd to exactly the same thing, the encryption string changed. He expected it to look exactly the same.

* The Oracle server is 11.1. The pwd was originally set with sqlplus using a 10.2 Oracle client. I reset it with sqlplus using an 11.2 Oracle client. This leads me to believe there is either a bug in the Oracle server (11.1), or between the client 10.2 and 11.1 server.

* I won't set this to "Resolved" just yet. I need some time to work with Oracle and our DBAs to get to root cause.
Back to top
View user's profile Send private message
mvic
PostPosted: Wed Mar 09, 2011 11:27 am    Post subject: Re: Clarification on some points Reply with quote

Jedi

Joined: 09 Mar 2004
Posts: 2080

OK there is too much odd stuff here.

Changing your db to match your xa_open string or vice versa.. the important thing is that they match.

But what you say about the pwd not being checked on xa_open, I have never heard of this before.

If this is true, it means my earlier comments might not apply.

Dunno what's different about your setup.. xa_open should be the call that connects and establishes credentials. Maybe your db or switch load file and/or drivers are hiding / caching / messing around with the normal behaviour?

As mqjeff said, are you using Broker? It might be a better bet to ask over there, in case they can say it's something to do with your drivers.

Sorry.. hope you get it fixed.
Back to top
View user's profile Send private message
scott9
PostPosted: Wed Mar 23, 2011 2:41 pm    Post subject: Root Cause: Database Cloning? Reply with quote

Acolyte

Joined: 11 Jul 2002
Posts: 62
Location: Sacramento,CA

Thanks all who contributed to this thread. I recently learned that the DBAs cloned the database involved in this environment. We are also having other strange issues with XA Coordination now that we think might be related to side effects of cloning. Also, I suspect the ID was never reset as first reported to me; therefore, the fact that changing the ID to the exact same thing may have corrected some corruption that might have resulted from cloning.

I realize that I'm not providing an exact root cause, but I wanted to share the part about cloning in case others have experienced odd behavior with cloned databases. For now, I place the blame on cloning, since it seems to be the most likely suspect. Cheers!
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » General IBM MQ Support » How to diagnose XAER_RMFAIL in xa_recover step?
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.