Author |
Message
|
ALCSALCS |
Posted: Fri Oct 14, 2005 2:42 am Post subject: MQ Trigger question |
|
|
Acolyte
Joined: 14 Apr 2002 Posts: 61
|
We are running MQ V5.3.1 in z/OS environment.
We have an application triggered input queue (trigger=first and trigger yes). Queue 'INPUT' receives messages from an external system.
Application 'A' reads the queue.
Normally queue 'INPUT' receives one message every two seconds.
When a trigger message arrives, application 'A' reads all the messages from the queue and exits.
Standard implementation of a triggerable queue.
If for some reason queue 'Input' reaches a certain threshold (120 messages) an offline job 'OFFLINE' runs and clears the queue.
The other day we reached this condition (120 messages) and the offline job 'OFFLINE' was called, but because this job had low priority and z/os was very busy, it didn't started to run until 1 hour later.
Looking at the contents of the 120 messages we saw that were put in the queue in a interval of 15 minutes, but for some reason unknown application 'A' didn't read the queue 'INPUT'.
Question 1) any idea o reason why suddendly application 'A' didn't read queue 'INPUT'.
a) Problem with a trigger message ? unlikely.
b) Problem with the application 'A' ? unlikely.
c) Something different in the message sent by the external system ? Difficult to know.
During the hour that the offline job 'OFFLINE' was waiting to run, we saw that application 'A' was reading messages from the queue 'INPUT' but on top of this application'A' was called at a rate of 200 times/sec in average to read queue 'INPUT'.
This produced a high activity in z/OS because the application 'A' was doing 200 times in a second a process of MQOPEN, MQGET, finding queue empty and MQCLOSE.
Questions 2.
How it's possible that application 'A' was reading correctly the queue after the problem started ? Application 'A' is called by the trigger message and the trigger message is generated when the message queue goes from '0' to '1' message.
Question 3.
Because job 'OFFLINE' didn't run for an hour, theoretically queue 'INPUT' contained at least 120 messages and how it is possible that a trigger message was produced to call application 'A'.
Question 4.
How it is possible that some many trigger messages were produced ?
(200 in a second)
Question 5
Until we can discover the origin of the problem any suggestion about what can be implemented if we receive this rate of '200 trigger messages/sec' ?
Any comment or any possible explanation about this problem will be welcomed.
Thanks |
|
Back to top |
|
 |
jefflowrey |
Posted: Fri Oct 14, 2005 3:09 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
I am betting your problem lies soley and completely with the code in Application A.
Probably, it decided to back out a message, and decided NOT to check the retry count.
And therefore, it spent the entire time trying to reprocess a message it could not handle (a "poison message" in the lingo). _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
Mr Butcher |
Posted: Fri Oct 14, 2005 3:18 am Post subject: |
|
|
 Padawan
Joined: 23 May 2005 Posts: 1716
|
if the queue reaches >120 messages and the offline job is started, then both (application A and the job) are reading the queue?
how far did the job get? was the queue opened by application A only or also by the job?
Q1 - maybe the messages where uncommitted (already read by the job, but the job did not commit already?!?) but - more likely - application A got a problem.
Q2 - yes, but this is not the only condition. there are many others. even with messages in a queue a trigger can be generated. this also applies to Q3
Q3 - one possible scenario - the batch job was running, application a stopped processing. because the queue was open for input by the job there was no re-triggering (a first-triggered queue will be triggered when the last process reading the queue closes and it is not empty).
now - whyever - the batch job ended or was cancled or or without reading all the messages, then the queue is triggered as explained above and application a is restarted.
Q4 - are you sure this was caused by triggering? if application A (at that moment) was the only application processing the queue, and if it does not process the message (mqopen,mqget, mqbackout mqclose ) or if application A abends after open), then the queue is triggered again and this will cause of loop (because of trigger first and the queue is not empty as described above). but this only happenes if A is the only one reading the queue. did you see any backouts or transaction abends?
check also the trigint of the queuemanager (although this applies only when messages arrive, but it will also cause a trigger).
Q5 - does the application checks the backout count? can you restrict application A (e.g. if it is a cics transaction then you can limit the transaction class)?
hard to find answers to that from here.
best thing is to look at the queue and qstats in the moment when the error occurs. it will show you if and which process has the queue open and is processing the queue. a reset qstats may also help you investigating how many messages have been get / put (especially trigger messages). other logs may help too (cics/ims/batch/whatever application A is running in).
switch trigger off next time to see what happens. if it ends, then of course it is the trigger (or better, its application A that gets triggered which is a works as designed). _________________ Regards, Butcher |
|
Back to top |
|
 |
ALCSALCS |
Posted: Fri Oct 14, 2005 5:00 am Post subject: |
|
|
Acolyte
Joined: 14 Apr 2002 Posts: 61
|
Answer to jefflowrey
Application A does the following
1) Open queue.
2) Get message.
3) If message not availlable , closes the queue
4) If message available, it procesess the message and reads the queue again until no message available, closes the queue
Options: MQGMO_NO_SYSNPOINT
Answer to Mr Butcher
Job 'offline' did not executed until 1 hour later. It was waiting, so queue INPUT was not open by the job OFFLINE'
Q1) Applicatiopn A does what I indicated above.
Q2 and Q3) During the hour only application A open the queue, job OFFLINE was waiting to be executed.
Q4) A dump taken when the problem occurred shows tha application 'A' has been called by a trigger message at a rate of 200 times/sec and the sequence of MQ commands where 'MQOPEN', 'MQGET' and 'MQCLOSE'
Trigint for queuemanager is TRIGINT(999999999)
Q5) Application A doesn't do backout.
What you mean by 'can you restrict application A'
Many Thanks for the information provided |
|
Back to top |
|
 |
jefflowrey |
Posted: Fri Oct 14, 2005 5:03 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
What does the application do if the message can't be processed?
What hapepns if the application abends? _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
Mr Butcher |
Posted: Fri Oct 14, 2005 5:11 am Post subject: |
|
|
 Padawan
Joined: 23 May 2005 Posts: 1716
|
okay, so application A was the only one processing the queue.
i accompany with jefflowrey, this looks like the typical "queue-not-empty-but-trigger-first" loop. and this is most likely caused by the application and not by the mq trigger mechanism.
by "restricting" i mean that you try to prevent your application A to be startet that many times within that short interval. i dont know your environment nor what application A is (batch, cics, ims, ....) so no more hints possible on this one. anyway, restricting does not solve the problem.
when application A was triggered 200 times a second, and did mqget, these mqget returned 2033 you said? and there where messages in the queue. did you see uncomitted messages? if so, okay, if not, why 2033?
Application Error? _________________ Regards, Butcher |
|
Back to top |
|
 |
ALCSALCS |
Posted: Fri Oct 14, 2005 6:41 am Post subject: |
|
|
Acolyte
Joined: 14 Apr 2002 Posts: 61
|
The application does MQGET.
It checks for completion code different from 0 and reason code different from 2033, if this is the case it sends an error message.
If completion code is 0, it procesess the message and continue to read next message until reason code is 2033, when it does MQCLOSE
If the application abends, it will issue a dump. (no dump during the time of the problem)
We need to think about a way to restrict the application 'A' to prevent the consequences of the problem until we can find the origin of it.
Looking at the code for application A, the only way that the code executes MQOPEN, MQGET and MQCLOSE is that the reason code was 2033 and this is shown in the dump that we took.
Unfortunately the information from the queue 'INPUT' produced by job 'offline' after the hour is not available anymore.
So it is difficult to explain why 2033.
Many Thanks |
|
Back to top |
|
 |
kevinf2349 |
Posted: Fri Oct 14, 2005 7:05 am Post subject: |
|
|
 Grand Master
Joined: 28 Feb 2003 Posts: 1311 Location: USA
|
Are your clearing out the CorrelID and MsgID fields to make all messages eligible?
Is the code re-entrant? |
|
Back to top |
|
 |
ALCSALCS |
Posted: Mon Oct 17, 2005 2:20 am Post subject: |
|
|
Acolyte
Joined: 14 Apr 2002 Posts: 61
|
Application A reads the messages without specifying any CorrelID or MsgID making any message in the queue eligible.
The code is non re-entrant.
Thanks |
|
Back to top |
|
 |
Mr Butcher |
Posted: Mon Oct 17, 2005 5:26 am Post subject: |
|
|
 Padawan
Joined: 23 May 2005 Posts: 1716
|
if you read a message then you will get msgid corelid filled from the message you get. if you reuse the storage / fields / variables (depending on your programming language) then the second get will be performed with the values returned from the first get, and will probably return with 2033. _________________ Regards, Butcher |
|
Back to top |
|
 |
ALCSALCS |
Posted: Mon Oct 17, 2005 6:45 am Post subject: |
|
|
Acolyte
Joined: 14 Apr 2002 Posts: 61
|
Application A reads the message from the queue INPUT without any selection criteria (any message is acceptable). It doesn't use either msgID or CorrelID.
Many Thanks |
|
Back to top |
|
 |
kevinf2349 |
Posted: Mon Oct 17, 2005 7:15 am Post subject: |
|
|
 Grand Master
Joined: 28 Feb 2003 Posts: 1311 Location: USA
|
Quote: |
Application A reads the message from the queue INPUT without any selection criteria (any message is acceptable). It doesn't use either msgID or CorrelID.
|
and then you clear the fields again before the next GET right?
If not, then you need to. |
|
Back to top |
|
 |
Mr Butcher |
Posted: Mon Oct 17, 2005 7:16 am Post subject: |
|
|
 Padawan
Joined: 23 May 2005 Posts: 1716
|
yes. my english is bad, and sometimes my explanaitions are bad. i try again
msg1 in queue msgid ABC correlid 123
msg2 in queue msgid DEF correlid 456
program a mqmd data buffer msgid '' correlid ''
first mqget < gets first message, after that msgid is ABC, correlid is 123
secondmqget returns 2033 beause the second MQGET reads with the values from the first MQGET (if same storage and not resettet).
maybe it is the best if you post the MQGET-Loop Code snipet. _________________ Regards, Butcher |
|
Back to top |
|
 |
ALCSALCS |
Posted: Mon Oct 17, 2005 7:38 am Post subject: |
|
|
Acolyte
Joined: 14 Apr 2002 Posts: 61
|
No, my answer was not clear enough.
The field is resetted.
But when the 2033 happened, I saw in the dump the sequence of MQ macros were MQOPEN, MQGET and MQCLOSE, meaning that the application A started (as consequence of a trigger message) it opened the queue 'INPUT', read for the first time and the queue was empty (2033) and close the queue.
That was the case for the 200 times in the dump that application A was called.
Many Thanks |
|
Back to top |
|
 |
Mr Butcher |
Posted: Mon Oct 17, 2005 10:42 pm Post subject: |
|
|
 Padawan
Joined: 23 May 2005 Posts: 1716
|
do the dump show who started application a?
if it was by triggering, ,do you see the trigger monitor in the dump? this one should also have lots of mqgets (reading the trigger messages)
what environment is application a? cics? ims? ?!?
if it is a trigger problem then you should maybe involve ibm. _________________ Regards, Butcher |
|
Back to top |
|
 |
|