Author |
Message
|
murdeep |
Posted: Wed May 12, 2010 8:31 am Post subject: Msg Flow Statistics - phantom messages with errors |
|
|
Master
Joined: 03 Nov 2004 Posts: 211
|
Hello, I have started to look into message flow statistics and I've run into something that has me scratching my head.
W2K
WMQ V7
WMB V6104
Flow stats enable with -o xml -t basic -n advanced for my EG
Subscribed to topic, getting archive stats and loading into tables.
What I've noticed is every once in awhile I'll have an interval where my TotalNumberOfMessagesWithErrors is greater than zero for some flows. Yet the messages are consumed correctly.
We don't wire our MQInput catch terminals so any errors that cause an exception would have the message go to the DLQ and we never have messages on the DLQ.
Terminal stats show that no error paths are traversed in the flows that show msgs with errors.
So I am perplexed why the count is greater than zero when I don't see any errors or execptions being thrown elsewhere.
Can anyone comment on what scenarios would cause the msg with error count to be incremented yet the message is consumed normally? |
|
Back to top |
|
 |
mrn |
Posted: Wed May 12, 2010 11:46 pm Post subject: |
|
|
Newbie
Joined: 12 May 2010 Posts: 5
|
Hi
Do you have a backout threshold set.
It's possible that some messages are being backed out due to an error and then reprocessed. These would show as both an error and a successfully processed message in the broker stats. |
|
Back to top |
|
 |
murdeep |
Posted: Thu May 13, 2010 7:35 am Post subject: |
|
|
Master
Joined: 03 Nov 2004 Posts: 211
|
BOTHresh set to 0 so if backed out when subsequently processed would go to DLQ. No msgs on DLQ.
Can anyone from IBM define exactly what causes TotalNumberOfMessagesWithErrors to be incremented? |
|
Back to top |
|
 |
Vitor |
Posted: Thu May 13, 2010 9:32 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
murdeep wrote: |
BOTHresh set to 0 so if backed out when subsequently processed would go to DLQ. |
No it wouldn't; it would go to the backout queue specified. The DLQ is for undeliverable messages. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
mrn |
Posted: Thu May 13, 2010 11:05 am Post subject: |
|
|
Newbie
Joined: 12 May 2010 Posts: 5
|
I'm assuming that you are using an MQinput node..
The following conditions will cause the TotalNumberOfMessagesWithErrors to be incremented :
1. Any Recoverable Exception downstream in the message flow
2. A failure in the input node - Eg unable to receive a message - basically anything that would be sent to the Catch Terminal.
3. A failure during commit or rollback of a message.
If you don't feel it's any of the above then you need to raise a PMR with IBM and provide full details of the scenario and how to recreate it. |
|
Back to top |
|
 |
murdeep |
Posted: Thu May 13, 2010 2:13 pm Post subject: |
|
|
Master
Joined: 03 Nov 2004 Posts: 211
|
Vitor wrote: |
murdeep wrote: |
BOTHresh set to 0 so if backed out when subsequently processed would go to DLQ. |
No it wouldn't; it would go to the backout queue specified. The DLQ is for undeliverable messages. |
In my last post I should have been clearer and stated that in my env BOQNAME is also not set, as well as BOTHRESH = 0, MQInput Catch terminal is not wired and messages end up on DLQ with a DLH Reason Code of 65536. Which is standard behavior. So in this case I guess the DLQ is also for unprocessable messages as well as undeliverable messages. |
|
Back to top |
|
 |
murdeep |
Posted: Thu May 13, 2010 2:20 pm Post subject: |
|
|
Master
Joined: 03 Nov 2004 Posts: 211
|
mrn wrote: |
I'm assuming that you are using an MQinput node..
The following conditions will cause the TotalNumberOfMessagesWithErrors to be incremented :
1. Any Recoverable Exception downstream in the message flow
2. A failure in the input node - Eg unable to receive a message - basically anything that would be sent to the Catch Terminal.
3. A failure during commit or rollback of a message.
If you don't feel it's any of the above then you need to raise a PMR with IBM and provide full details of the scenario and how to recreate it. |
Ok, I think optioon 1 may be in play here.
Can you define exactly what a recoverable exception would be? Can an ODBC connection auto recover?
What I have noticed is I can force the TotalNumberOfMessagesWithErrors to increment when I redeploy the flow and then the next message is sent through which happens to insert a record in a database. I'm wondering if the odbc connection is marked as invalid when I redploy. The 2nd and subsequent messges don't increment the count. Do certain odbc exceptions automatically recover like what I have described above and this is what I am seeing here? |
|
Back to top |
|
 |
mrn |
Posted: Fri May 14, 2010 12:26 am Post subject: |
|
|
Newbie
Joined: 12 May 2010 Posts: 5
|
Statistics are gathered on a per message basis which starts when an input node is triggered by receiving a message - and completes when the flow processing winds back to that input node - at which the the node must decide if processing was successful or whether errors occurred downstream.
If you see any error message in the console from a downstream node this will be reported in the statistics .
A recoverable error could be anything a processing node considers important enough to inform the user about. So for a node that connects to a database it will be a failure to connect, or an SQL exception...; for other nodes it could be a badly formatted message .. and so on ...
It's anything that causes a BIPxxxxE in the system / console log. It's recoverable in the sense that a temporary condition related to one input message has caused the problem , and it's not a catastrophic (fatal) error in which case the execution group needs to be taken down and restarted to preserve data integrity.
I'd be surprised if an ODBC auto recover would be treated as an error and perhaps the statistics gathering is being too sensitive in this case ...
Can you share with us what bip messages ,if any, you are seeing in the log. |
|
Back to top |
|
 |
murdeep |
Posted: Fri May 14, 2010 6:06 am Post subject: |
|
|
Master
Joined: 03 Nov 2004 Posts: 211
|
mrn wrote: |
Statistics are gathered on a per message basis which starts when an input node is triggered by receiving a message - and completes when the flow processing winds back to that input node - at which the the node must decide if processing was successful or whether errors occurred downstream.
If you see any error message in the console from a downstream node this will be reported in the statistics .
A recoverable error could be anything a processing node considers important enough to inform the user about. So for a node that connects to a database it will be a failure to connect, or an SQL exception...; for other nodes it could be a badly formatted message .. and so on ...
It's anything that causes a BIPxxxxE in the system / console log. It's recoverable in the sense that a temporary condition related to one input message has caused the problem , and it's not a catastrophic (fatal) error in which case the execution group needs to be taken down and restarted to preserve data integrity.
I'd be surprised if an ODBC auto recover would be treated as an error and perhaps the statistics gathering is being too sensitive in this case ...
Can you share with us what bip messages ,if any, you are seeing in the log. |
I'm not seeing what you describe regarding the BIP error messages. I have a flow where the d/b was unavailable and these messages were backed out resulting in the archive stats entry for that flow for that interval to have the backout count incremented and this matched the BIPxxxxE messages in the error log, yet the msg error count is zero
I then have a stats record for a flow that shows a msg error count of 2 where there are no corresponding BIPxxxxE messages in the log.
So a msg with an error does not necessarily result in a BIP event message being cut. |
|
Back to top |
|
 |
mrn |
Posted: Fri May 14, 2010 8:55 am Post subject: |
|
|
Newbie
Joined: 12 May 2010 Posts: 5
|
Yes , you are right - some errors are not reported as BIP messages but are still visible to the input node which updates the error count in the statistics
So I suppose the question to you is would you rather not know about those type of errors in the statistics ? |
|
Back to top |
|
 |
|