|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
msgs getting stuck in IBMi queue after processing few msgs |
« View previous topic :: View next topic » |
Author |
Message
|
Vitor |
Posted: Wed Aug 17, 2016 5:51 am Post subject: |
|
|
Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
vishBroker wrote: |
The message does not even enter debug - prior messages enter debug and get processed. |
User trace. The debugger will affect your processing.
What is the channel status of the client connection when the messages get "stuck"? _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
|
mqjeff |
Posted: Wed Aug 17, 2016 6:02 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
A message that is "stuck" is not committed. _________________ chmod -R ugo-wx / |
|
Back to top |
|
|
vishBroker |
Posted: Wed Aug 17, 2016 6:17 am Post subject: |
|
|
Centurion
Joined: 08 Dec 2010 Posts: 135
|
Quote: |
A message that is "stuck" is not committed. |
The PUT is commited, seems so - as another application (RFHUtilc) can read the 'stuck'ed message. |
|
Back to top |
|
|
smdavies99 |
Posted: Wed Aug 17, 2016 6:20 am Post subject: |
|
|
Jedi Council
Joined: 10 Feb 2003 Posts: 6076 Location: Somewhere over the Rainbow this side of Never-never land.
|
This is clearly the case where debugger is not the right tool for the job.
Using usertrace will let you trace the path of a message through the flow in a timely manner and without waiting for the blob of mostly water that is trying to operate the debugger in real time.
If the 'stuck' message never gets into the flow then it is just not ready to be read.
Normally, this means that it is uncomitted. _________________ WMQ User since 1999
MQSI/WBI/WMB/'Thingy' User since 2002
Linux user since 1995
Every time you reinvent the wheel the more square it gets (anon). If in doubt think and investigate before you ask silly questions. |
|
Back to top |
|
|
vishBroker |
Posted: Wed Aug 17, 2016 7:17 am Post subject: |
|
|
Centurion
Joined: 08 Dec 2010 Posts: 135
|
Here are excerpts from userTrace.
++++
2016-08-17 10:26:30.750740 22711 UserTrace BIP11506I: Committed a local transaction.
2016-08-17 10:26:30.750806 22711 UserTrace BIP11504I: Waiting for data from input node ''CDCIn_MQInput''.
2016-08-17 10:26:35.867408 22711 UserTrace BIP11507W: Rolled back a local transaction.
2016-08-17 10:27:30.868978 22711 UserTrace BIP13034I: Rolled back a locally-coordinated transaction on WebSphere MQ queue manager ''CDCQMD''.
2016-08-17 10:27:30.869078 22711 UserTrace BIP13031I: Closed WebSphere MQ queue ''CDC_IGNORED_MSGS.QUEUE''.
2016-08-17 10:27:30.869152 22711 UserTrace BIP13024I: Disconnected from WebSphere MQ queue manager ''CDCQMD''.
2016-08-17 10:27:30.899962 22711 UserTrace BIP13034I: Rolled back a locally-coordinated transaction on WebSphere MQ queue manager ''PTL.SUPPORT''.
2016-08-17 10:27:30.930376 22711 UserTrace BIP13031I: Closed WebSphere MQ queue ''RIPTIDE.OTR.ROAD_CALL''.
2016-08-17 10:27:30.961586 22711 UserTrace BIP13024I: Disconnected from WebSphere MQ queue manager ''PTL.SUPPORT''.
22083 22407 22711 3361 3424 3425
++++
PTL.SUPPORT => This is remote QM, connected from MQInput node using 'clientConnection'.
CDCQMD - is a local QM.
I do see, messages are being rolled back - but reason is not sure. Because - when we use the SAME code , but read the message from local queue (instead of remote QM queue) - we DO NOT see this behaviour for SAME messages. |
|
Back to top |
|
|
vishBroker |
Posted: Wed Aug 17, 2016 7:53 am Post subject: |
|
|
Centurion
Joined: 08 Dec 2010 Posts: 135
|
Is there a way to upload a file here?
here is the trace for one of the thread. If needed, I can copy/upload complete userTrace.
Code: |
2016-08-17 10:26:25.888580 22711 UserTrace BIP13026I: Read message from WebSphere MQ queue ''RIPTIDE.OTR.ROAD_CALL''.
2016-08-17 10:26:25.888638 22711 UserTrace BIP11501I: Received data from input node ''CDCIn_MQInput''.
2016-08-17 10:26:25.888706 22711 UserTrace BIP3907I: Message received and propagated to 'out' terminal of input node 'com.ptl.cdc.PTL_CDCPattern_MF.CDCIn_MQInput'.
2016-08-17 10:26:25.888844 22711 UserTrace BIP6060I: Node 'com.ptl.cdc.PTL_CDCPattern_MF.CDCIn_MQInput' used parser type ''Properties'' to process a portion of the incoming message of length '0' bytes beginning at offset '0'.
2016-08-17 10:26:25.888880 22711 UserTrace BIP6061I: Node 'com.ptl.cdc.PTL_CDCPattern_MF.CDCIn_MQInput' used parser type ''MQMD'' to process a portion of the incoming message of length '364' bytes beginning at offset '0'. The parser type was selected based on value ''MQHMD'' from the previous parser.
2016-08-17 10:26:25.888942 22711 UserTrace BIP6061I: Node 'com.ptl.cdc.PTL_CDCPattern_MF.CDCIn_MQInput' used parser type ''DFDL'' to process a portion of the incoming message of length '1429' bytes beginning at offset '364'. The parser type was selected based on value ''DFDL'' from the previous parser.
2016-08-17 10:26:25.889280 22711 UserTrace BIP2537I: Node 'com.ptl.cdc.PTL_CDCPattern_MF.CheckMessageValidity_Compute': Executing statement ''BEGIN ... END;'' at ('com.ptl.cdc.PTL_CDCPattern_CheckMessageValidity_Compute.Main', '2.2').
2016-08-17 10:26:25.889312 22711 UserTrace BIP2537I: Node 'com.ptl.cdc.PTL_CDCPattern_MF.CheckMessageValidity_Compute': Executing statement ''CALL CopyEntireMessage();'' at ('com.ptl.cdc.PTL_CDCPattern_CheckMessageValidity_Compute.Main', '3.3').
....
2016-08-17 10:26:25.917926 22711 UserTrace BIP13030I: Opened WebSphere MQ queue ''CDC_IGNORED_MSGS.QUEUE''.
2016-08-17 10:26:25.918132 22711 UserTrace BIP13027I: Wrote message to WebSphere MQ queue ''CDC_IGNORED_MSGS.QUEUE''.
2016-08-17 10:26:25.918140 22711 UserTrace BIP2622I: Message successfully output by ''com.ptl.cdc.PTL_CDCPattern_MF.IGNORED_MSGS_QUEUE'' to queue ''CDC_IGNORED_MSGS.QUEUE'' on queue manager ''CDCQMD''.
2016-08-17 10:26:25.918206 22711 UserTrace BIP2537I: Node 'com.ptl.cdc.PTL_CDCPattern_MF.CheckMessageValidity_Compute': Executing statement ''RETURN FALSE;'' at ('com.ptl.cdc.PTL_CDCPattern_CheckMessageValidity_Compute.Main', '29.8').
2016-08-17 10:26:26.289974 22711 UserTrace BIP13033I: Committed a locally-coordinated transaction on WebSphere MQ queue manager ''PTL.SUPPORT''.
2016-08-17 10:26:26.292414 22711 UserTrace BIP13033I: Committed a locally-coordinated transaction on WebSphere MQ queue manager ''CDCQMD''.
2016-08-17 10:26:26.292428 22711 UserTrace BIP11506I: Committed a local transaction.
2016-08-17 10:26:26.526078 22711 UserTrace BIP11507W: Rolled back a local transaction.
2016-08-17 10:26:26.855080 22711 UserTrace BIP11504I: Waiting for data from input node ''CDCIn_MQInput''.
2016-08-17 10:26:30.683046 22711 UserTrace BIP13026I: Read message from WebSphere MQ queue ''RIPTIDE.OTR.ROAD_CALL''.
2016-08-17 10:26:30.683096 22711 UserTrace BIP11501I: Received data from input node ''CDCIn_MQInput''.
2016-08-17 10:26:30.683164 22711 UserTrace BIP3907I: Message received and propagated to 'out' terminal of input node 'com.ptl.cdc.PTL_CDCPattern_MF.CDCIn_MQInput'.
2016-08-17 10:26:30.683284 22711 UserTrace BIP6060I: Node 'com.ptl.cdc.PTL_CDCPattern_MF.CDCIn_MQInput' used parser type ''Properties'' to process a portion of the incoming message of length '0' bytes beginning at offset '0'.
.....
2016-08-17 10:26:30.712790 22711 UserTrace BIP2622I: Message successfully output by ''com.ptl.cdc.PTL_CDCPattern_MF.IGNORED_MSGS_QUEUE'' to queue ''CDC_IGNORED_MSGS.QUEUE'' on queue manager ''CDCQMD''.
2016-08-17 10:26:30.712840 22711 UserTrace BIP2537I: Node 'com.ptl.cdc.PTL_CDCPattern_MF.CheckMessageValidity_Compute': Executing statement ''RETURN FALSE;'' at ('com.ptl.cdc.PTL_CDCPattern_CheckMessageValidity_Compute.Main', '29.8').
2016-08-17 10:26:30.749544 22711 UserTrace BIP13033I: Committed a locally-coordinated transaction on WebSphere MQ queue manager ''PTL.SUPPORT''.
2016-08-17 10:26:30.750730 22711 UserTrace BIP13033I: Committed a locally-coordinated transaction on WebSphere MQ queue manager ''CDCQMD''.
2016-08-17 10:26:30.750740 22711 UserTrace BIP11506I: Committed a local transaction.
2016-08-17 10:26:30.750806 22711 UserTrace BIP11504I: Waiting for data from input node ''CDCIn_MQInput''.
2016-08-17 10:26:35.867408 22711 UserTrace BIP11507W: Rolled back a local transaction.
2016-08-17 10:27:30.868978 22711 UserTrace BIP13034I: Rolled back a locally-coordinated transaction on WebSphere MQ queue manager ''CDCQMD''.
2016-08-17 10:27:30.869078 22711 UserTrace BIP13031I: Closed WebSphere MQ queue ''CDC_IGNORED_MSGS.QUEUE''.
2016-08-17 10:27:30.869152 22711 UserTrace BIP13024I: Disconnected from WebSphere MQ queue manager ''CDCQMD''.
2016-08-17 10:27:30.899962 22711 UserTrace BIP13034I: Rolled back a locally-coordinated transaction on WebSphere MQ queue manager ''PTL.SUPPORT''.
2016-08-17 10:27:30.930376 22711 UserTrace BIP13031I: Closed WebSphere MQ queue ''RIPTIDE.OTR.ROAD_CALL''.
2016-08-17 10:27:30.961586 22711 UserTrace BIP13024I: Disconnected from WebSphere MQ queue manager ''PTL.SUPPORT''.
|
[/code] |
|
Back to top |
|
|
vishBroker |
Posted: Thu Aug 18, 2016 8:13 am Post subject: |
|
|
Centurion
Joined: 08 Dec 2010 Posts: 135
|
Trying to summarize the issue and observations -
Details –
1. IIB flow has MQInput node connecting to remote QM using client Connections and has additional instances = 2.
2. Issue – Message(s) get stuck on to queue after processing few messages. (Or sometimes even for first message itself)
3. We see QDEPTH is greater than 0 even if IPPROCs is 2 (or more)
4. Sometimes, The ‘stuck’ed message gets processed after sometime. [This happens after ‘MQConnIdleTimeOut’ occurred - IPPROCS drop to 1 and new connection is created IPPROCS becomes 2 again and processes the message]
5. Sometime – messages get stuck for long.
6. And, sometimes, if we put one more message on the queue – [ IPPROCS increases and] new message and stuck message get processed.
7. The solution uses GC and interacts with DB in the same flow.
Things done to narrow down the issue
8. To prove issue is not because of GC – We removed the GC logic and we were doing DB interactions directly. In this case also, we saw above behavior.
9. To prove that the issue is not because of DB interaction code – We created simple flow [MQInput -> MQOutput] reading from remoteQ using clientConnection and writing into local queue. => We still observed same behavior.
We tend to believe that the issue is in the way IIB flow is reading the messages from queue on remote QM using client connection properties. - Please correct/guide.
NOTE – we change the # of instances to 0 [meaning only single thread is reading from the queue] => we DO NOT SEE the issue.
(we do not see the issue for simple flow[#9] as well as original solution[#7] |
|
Back to top |
|
|
Vitor |
Posted: Thu Aug 18, 2016 8:30 am Post subject: |
|
|
Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
What is SHARECONV set to on the channel? _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
|
mqjeff |
Posted: Thu Aug 18, 2016 8:33 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
When you see that qdepth >0 but messages aren't being processed, what is the UNCOM value when you get the qstatus? _________________ chmod -R ugo-wx / |
|
Back to top |
|
|
vishBroker |
Posted: Thu Aug 18, 2016 8:53 am Post subject: |
|
|
Centurion
Joined: 08 Dec 2010 Posts: 135
|
SHARED connections are 10 on the server channel.
The uncommited messages were NO.
I do not have access to remote QM. I am getting these details from MQExplorer. [I wish, I could share the screenshot, but it seems here screenshots/images are not allowed.]
WIll try to get these details by asking IBMi support person.
Well, using MS0P to monitor remote queue - could capture few things.
Here is .csv file from MQ Activity viewer when message got stuck.
Code: |
"Queue name","App name","Process","Thread","App type","App description","Browse","Inquire","Input","Output","Set","Open options","User ID","Channel name","Conn name","UOW type","Handle state","Asynchronous state","QM UOW ID","External UOW ID"
"RIPTIDE.OTR.ROAD_CALL","DataFlowEngine","3047","0","User","WebSphere MQ Channel","No","Yes","Shared","No","No","Save all context, Inquire, Fail if quiescing, Input shared","MQUSER","RIPTIDE.SVRCONN","xx.xxx.xxx.xxx","Queue manager","Active","Suspended","0.0","XA_FORMATID[00000000] XA_GTRID[] XA_BQUAL[]"
"RIPTIDE.OTR.ROAD_CALL","DataFlowEngine","3047","0","User","WebSphere MQ Channel","No","Yes","Shared","No","No","Save all context, Inquire, Fail if quiescing, Input shared","MQUSER","RIPTIDE.SVRCONN","xxx.xxx.xx.xx","Queue manager","Inactive","Active","0.0","XA_FORMATID[00000000] XA_GTRID[] XA_BQUAL[]"
"RIPTIDE.OTR.ROAD_CALL","803530/PPPP/ROAD_CALL","75942","1","User","","No","No","No","Yes","No","Output, Fail if quiescing","PPP","","","Queue manager","Inactive","None","0.0","XA_FORMATID[00000000] XA_GTRID[] XA_BQUAL[]"
|
Last edited by vishBroker on Thu Aug 18, 2016 9:05 am; edited 2 times in total |
|
Back to top |
|
|
vishBroker |
Posted: Thu Aug 18, 2016 8:58 am Post subject: |
|
|
Centurion
Joined: 08 Dec 2010 Posts: 135
|
One DFE thread is in suspended state.
Quote: |
"RIPTIDE.OTR.ROAD_CALL","DataFlowEngine","3047","0","User","WebSphere MQ Channel","No","Yes","Shared","No","No","Save all context, Inquire, Fail if quiescing, Input shared","MQUSER","RIPTIDE.SVRCONN","x.xx.xx.xxx","Queue manager","Active","Suspended","0.0","XA_FORMATID[00000000] XA_GTRID[] XA_BQUAL[]" |
Kindly point - where should I start looking.
Have ran user trace - do not see any errors/exceptions.
But, I do see 'local transactions are rolled back'. Here in the flow - we have only MQInput node and MQOutput node - no compute node. I could see previous messages(before getting stuck) are properly PUT on the target queue (which is local for IIB NODE).
Seems like the MQconn/handle is getting closed by the server but client still holds it {or other way round}.
What should I be looking for?
kindly help. |
|
Back to top |
|
|
vishBroker |
Posted: Thu Aug 18, 2016 9:08 am Post subject: |
|
|
Centurion
Joined: 08 Dec 2010 Posts: 135
|
FYI - edited the CSV file to mask few things.
When, this is captured - 2 threads were reading from the queue (IPPROC = 2) and QDepth was 1.
The message got picked up after 1 minute or so. |
|
Back to top |
|
|
vishBroker |
Posted: Thu Aug 18, 2016 11:59 am Post subject: |
|
|
Centurion
Joined: 08 Dec 2010 Posts: 135
|
I looked into infocenter for the meaning of 'Asynchronous state' = suspended. This is what I got - but not able to understand it completely.
Quote: |
Suspended: The asynchronous consumption callback has been suspended so that asynchronous message consumption cannot currently proceed on this object handle. This can be either because an MQCB call with Operation MQOP_SUSPEND has been issued against this object handle by the application, or because it has been suspended by the system. If it has been suspended by the system, as part of the process of suspending asynchronous message consumption the callback function will be called with the reason code that describes the problem resulting in suspension. This will be reported in the Reason field in the MQCBC structure that is passed to the callback function. For asynchronous message consumption to proceed, the application must issue an MQCB call with the Operation parameter set to MQOP_RESUME. |
THe consumer application in this case is IIB. We are NOT coding anything out-of-ordinary for MQ Input node.
Transaction Mode - YES
We are using MQ client connection properties to read to remote queue.
Additional instances = 2
OrderMode = default
So, MQ connections are made and handled by IIB itself.
(IIB Node does has Local QM configured.)
And we are NOT doing anything thing in the code - it is just MQInput ->MQOutput. (for debug purpsoe)
I don't see any place to provide MOOP_RESUME option. (And hoping, we are not required to provide it as well) |
|
Back to top |
|
|
mqjeff |
Posted: Fri Aug 19, 2016 3:58 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
All that stuff about MQCB fields is, indeed, handled by IIB itself.
It sounds like something is going wrong with the client connection - either on the IIB side or the MQ side.
I haven't worked with the MQ*Nodes using client connections, so I'm not sure if there are parameters for things like timeouts or loop intervals or whatever.
It sounds like you've done enough troubleshooting to identify the next place to troubleshoot and also figured out how to replicate the issue. These are both very good things!
... look at the AMQERR logs for the qmgr in question, and look at the system log on the broker side.
If you don't see anything, enable broker service trace, and very carefully examine the output for any mq events that look like they might involve the suspend.
If it looks like the broker is receiving a message to suspend the connection, then look at the qmgr side and run a trace on it. You can limit the trace to specific things - possibly specific enough to only show the broker client connections... but I don't remember if it gets that tight. You should at least be able to identify the amq process that's running the thread that's talking to broker. Then you can only look at that trace file. (their names include the pid.)
If that still gets you nowhere, or you want help going through this process... then open a PMR. And expect to be told to upgrade your system to the latest FP if it's not there. _________________ chmod -R ugo-wx / |
|
Back to top |
|
|
vishBroker |
Posted: Mon Aug 22, 2016 2:23 am Post subject: |
|
|
Centurion
Joined: 08 Dec 2010 Posts: 135
|
Thank you, MQJeff!
We have opened a PMR. Will keep updated.
Thanks all for the inputs. |
|
Back to top |
|
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|