|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
 |
|
MQ Backout |
« View previous topic :: View next topic » |
Author |
Message
|
mqjava |
Posted: Mon Nov 08, 2010 11:31 am Post subject: MQ Backout |
|
|
 Voyager
Joined: 25 May 2009 Posts: 80 Location: New Jersey
|
Hi All,
We have a queue manager QM1 in which a client application is making around 200 connections through a svrconn channel and puts message on a alias queue, the targq for the alias queue is cluster queue which is in anothere queue manager QM3, message from QM1 to QM3 passes through a gateway queue manager QM2, this works fine. But during heavy traffic time when the client puts lots of message in the alias queue there is a huge message backlog in the SYSTEM.CLUSTER.TRANSMIT.QUEUE and messages are flowing very slowly from QM1 to QM2(gateway) then to QM3. MQ is rolling back the transaction with the error - the log space is getting filled. We have currently set the linear log space to 5GB but still its not enough, my main concern is when MQ rolls back the transcation the MQ client application crashes and its not able to connect again, MQ client application getting error 2059, i am not sure how the client application is getting affected when QM rolls back the transcation, how can we have the MQ client keep running when MQ does a rollback.
Thanks in Advance |
|
Back to top |
|
 |
zonko |
Posted: Mon Nov 08, 2010 12:19 pm Post subject: |
|
|
Voyager
Joined: 04 Nov 2009 Posts: 78
|
Obviously it does not get 2059 initially, but some other reason code, which you have failed to mention, and I suspect, the app has failed to cater for, probably 2003, although I suppose 2102 is a possibility. |
|
Back to top |
|
 |
bruce2359 |
Posted: Mon Nov 08, 2010 2:20 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
"the log space is getting filled" A more direct solution would be to provide more log space.
How large are your message size? How many messages comprise a single transaction?
5GB may sound like a lot of log space, but for large numbers of concurrent UofWs with large messages, 5GB may be insufficient. How did you calculate log file size for this queue manager? _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Mon Nov 08, 2010 5:36 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
The application isn't committing enough, if at all. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
mqjava |
Posted: Tue Nov 09, 2010 9:55 am Post subject: |
|
|
 Voyager
Joined: 25 May 2009 Posts: 80 Location: New Jersey
|
Hi All,
First we had 2GB log space, then we increased it to 3 GB and then we got the same problem -> log space getting filled - then mq rollsback, then client crashes, then we increased it to 5 GB then again same problem. Then we increased it to 8 GB then again same problem.
Message size is around 100 K and they are persistent.
These message are like fire and forget type. Client puts the message in the queue and forgets, its not involed in trascation. I
checked the connections which is taking more log space during the heavy traffic time, the cluster channels which pickups the message from SCTQ is filling up log space and rolls back, all the messages in the SCTQ are the message put on the alias queue by the client application,
because of large number of messages put in the alias queue at the same time, lot of messages are waiting in the SCTQ to be sent to the destintaion queue and log space is getting filled,
but after MQ rollsback then messages are flowing fine but slowly, main concern is we have to recycle the client everytime when MQ rollsback so that the client can connect again.
How MQ rollback and Client connectivity is related. After MQ rollsback the transaction because of log space getting filled MQ client crashes, its not able to connect again to the QM, after we recycle the client then its able to connect. |
|
Back to top |
|
 |
Vitor |
Posted: Tue Nov 09, 2010 10:09 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
mqjava wrote: |
Client puts the message in the queue and forgets, its not involed in trascation. |
That's as maybe. The fact that it's a fire & forget design doesn't mean the application hasn't been incorrectly coded to put messages under syncpoint. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
bruce2359 |
Posted: Tue Nov 09, 2010 10:11 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Quote: |
First we had 2GB log space, then we increased it to 3 GB and then we got the same problem |
For clarity, increasing log space did not cause the out-of-log-space problem, as your narrative suggests.
What you are saying is that you responded to the out-of-space condition by increasing log space, yes? Then increasing it again in hopes of resolving the out-of-space condition? Then increasing it again?
Keep in mind that when your app puts a message it is logged (log entry 1). When an MCA gets the message from the xmit queue, it is logged again (log entry 2). So, heavy workload results in more log space.
Quote: |
How MQ rollback and Client connectivity is related. |
Not related - beyond the application failing to respond to the log-full ReasonCode with some business logic/code that tries to redrive the puts.
If an application doesn't decide (with MQCMIT or MQBACK) what to do with the messages put in UofWs, what do think the qmgr will do with in-doubt UofWs? The answer is well documented in the APR and APG.
Did the application explicitly disconnect from the qmgr (as has been suggested by colleagues)?
Back to my earlier question: how was the size of the logs calculated?
Did you just make more/bigger logs hoping that you would resolve the out-of-log-space problem?
There are some MQSC script commands for interrogating (DISPLAY) the qmgr as to which log segments are needed for currently in-flight UofWs. Have you tried the appropriate DISPLAY commands?
What have you done, other than increasing log size and number of logs? _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
bruce2359 |
Posted: Tue Nov 09, 2010 10:24 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Fire and forget is an application design and coding choice.
If the messages are of little value, AND/OR can be recovered by application cdoe, then ensure that the app is not creating messages in UofWs; then ensure that the app is creating non-persistent messages. These two changes will stop the logging.
Are you a programmer/developer? Have you inspected the app code? Or are you relying on a developer telling you that the app doesn't UofW?
One other thing, for clarity and readability, please break your one very large paragraph into smaller paragraphs - where the subject seems to change. Somewhere on the keyboard is a key that will do this for you. Perhaps it's the Enter key... Look at examples from others that post here. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
fjb_saper |
Posted: Tue Nov 09, 2010 11:23 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Quote: |
because of large number of messages put in the alias queue at the same time, lot of messages are waiting in the SCTQ to be sent to the destination queue and log space is getting filled, |
This is indicative of 2 problems, either of which could be true, or both could be true at the same time.
- A large number of messages on the SCTQ that do not flow through the channel can be a symptom of 2 distinct and different causes:
- Uncommitted messages on the SCTQ. The putting application did not commit the messages and they cannot flow through the channel
- The destination queue is full and the messages can only travel as fast as they are being consumed on the other side, being put on the DLQ or expire. This behavior can be somewhat tweaked by channel attributes like retry wait time and retry count.
- Last but the least likely: The channel sequence number between autodefined cluster sender and cluster receiver may no longer match.
Use standard channel sequence number resolution (see intercommunications manual). The qmgr may try to find an alternative route to the destination which might account for the long transit times. (happened to me only once in x many years)
From the other comment saying that the messages flow very slowly after the crash, I do suspect that you are hitting a queue full condition on the target queue manager. Either use better load balancing or scale the consumer application that is not keeping up. Alternatively increase destination max queue depth.
Have fun  _________________ MQ & Broker admin |
|
Back to top |
|
 |
bruce2359 |
Posted: Tue Nov 09, 2010 2:45 pm Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Exactly how does the application crash? A ReasonCode of logs-full is not a crash; rather it is an indicator of insufficient (log) resources for an MQI call to complete successfully.
All MQI calls return to the application program a CompletionCode and ReasonCode that indicate the success or failure of the MQI call that was being executed by your application.
It is the responsibility of the application to deal with (catch) the CompletionCode and/or ReasonCode, and determine what to do next.
Please be very specific as to what your application did and did not do when presented with the log-full ReasonCode. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
Mr Butcher |
Posted: Tue Nov 09, 2010 11:18 pm Post subject: |
|
|
 Padawan
Joined: 23 May 2005 Posts: 1716
|
we had a "log space problem" before that looked similiar.... maybe it helps if i describe it .....
applicaiton is to convert data from format a to format b. 2 processes by cusotmer, many customers, so about 100 processes on that machine / queuemanager. one customer managed it to send a 0 byte message. the application program did not commit on that message and just went stuck. all other application processes continued to run well. so that one uow stayed open, untill mq comes to the point where it tried to reuse that logfile (circular logging) and was not able to do so because of that open uow. so that stuck process was killed with "blablabla to release log space".
increasing the logs does not help, as it only takes longer till mq comes to the "point of reuse", and at least the application programm was fixed to be able to handle these kind of messages.
however, between the arrival of the 0 byte message and the process kill by mq to release log space that situation could be detected by checking the queue status, it showed UNCOM(YES) for the proper queue. maybe this is something you can try on your system too to check if something is stuck preventing log space from being released.
maybe its a similiar situation in your case ... just a guess. Which error do you get exactly when mq detects the logs being full (i think there are different ones).
however, this does not explain low throughput on the SCTQ .... _________________ Regards, Butcher |
|
Back to top |
|
 |
bruce2359 |
Posted: Wed Nov 10, 2010 7:00 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
Quote: |
however, this does not explain low throughput on the SCTQ .... |
I would expect to see an overall slowdown as the qmgr attempts to manage (backout) multiple concurrent UofWs that encounter out-of-log-space condition. The MCA is one of the UofWs, as well. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
Mr Butcher |
Posted: Wed Nov 10, 2010 7:24 am Post subject: |
|
|
 Padawan
Joined: 23 May 2005 Posts: 1716
|
imho, not any of the "current" UOWs are killed in when MQ encounteres this situation i described, but the "hanging" UOW that is blocking the log file from being reused....... _________________ Regards, Butcher |
|
Back to top |
|
 |
bruce2359 |
Posted: Wed Nov 10, 2010 7:40 am Post subject: |
|
|
 Poobah
Joined: 05 Jan 2008 Posts: 9469 Location: US: west coast, almost. Otherwise, enroute.
|
I'll rephrase due my lack of clarity...
I would expect to see an overall reduction in throughput ...
when the qmgr backs out multiple (high number) concurrent UofWs that did not complete successfully and/or explicitly commit the UofW (for whatever reason).
As soon as some log space frees up, new UofWs start, soon the logs fill again, and UofWs are backed out, which frees up some log space... repeat. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
 |
mqjava |
Posted: Thu Nov 11, 2010 10:23 am Post subject: |
|
|
 Voyager
Joined: 25 May 2009 Posts: 80 Location: New Jersey
|
Quote: |
Did the application explicitly disconnect from the qmgr (as has been suggested by colleagues)?
|
No, some of the svrconn channels are left open and App team says they are doing proper close and disconnect, but its not closing. The MQ client is datapower.
Quote: |
There are some MQSC script commands for interrogating (DISPLAY) the qmgr as to which log segments are needed for currently in-flight UofWs. Have you tried the appropriate DISPLAY commands?
|
Yes i checked the queue status for Uncommited messages and checked the conn for unresolved UOW, SCTQ shows uncommited messages and the cluster channel connection which reads message from SCTQ has unresolved UOW.
Quote: |
Uncommitted messages on the SCTQ. The putting application did not commit the messages and they cannot flow through the channel
|
App team says they are not using UOW, and they are doing proper MQ close and Disconnect.
Quote: |
The destination queue is full and the messages can only travel as fast as they are being consumed on the other side, being put on the DLQ or expire. This behavior can be somewhat tweaked by channel attributes like retry wait time and retry count. |
Queue is not full, but there is a huge backlog, app processing the messages are very flow, as its doing some trasformation and doing DB insert. Asked the app team to create more instances of the reading process.
Quote: |
Exactly how does the application crash? A ReasonCode of logs-full is not a crash; rather it is an indicator of insufficient (log) resources for an MQI call to complete successfully.
|
When MQ rollsback, immediatley they see 2059 errors, after few seconds Datapower goes un responsive, Datapower admins are saying they are not able to communicate with the Datapower box and its freezed, they had to reboot it to get it working. |
|
Back to top |
|
 |
|
|
 |
|
Page 1 of 1 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|