|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
Newbie desperate for help! QM not receiving messages |
« View previous topic :: View next topic » |
Author |
Message
|
gstephen |
Posted: Thu Jun 16, 2005 5:00 am Post subject: Newbie desperate for help! QM not receiving messages |
|
|
Newbie
Joined: 16 Jun 2005 Posts: 7 Location: Toronto, Canada
|
Hi all,
I'm new to MQ so be gentle please!
I have a cluster with 3 QMs. Messages originate from QM0 and get sent to QM1 and QM2 in round-robin fashion to share the load. Well, after applying some patches to Solaris, QM1 no longer seems to be receiving messages, they keep going to QM2, which is good since it means the system still works at 50% and QM2 can handle the full load, but it's not supposed to work like this. So, I've had QM1 turned off for weeks and I turn it on periodically some evenings to try to get it working but no matter what I've tried, messages keep going to only QM2.
Is there some generic reason why this might be happening? I bet it's something really simple that I'm just not seeing because I'm new at this. All the channels seem to be running.
Any help would be greatly appreciated!
Graham |
|
Back to top |
|
 |
mq_crazy |
Posted: Thu Jun 16, 2005 6:08 am Post subject: |
|
|
 Master
Joined: 30 Jun 2004 Posts: 295
|
Did u check whether the QM1 is still in the cluster?? Maybe you can do a REFRESH CLUSTER and see. |
|
Back to top |
|
 |
gstephen |
Posted: Thu Jun 16, 2005 10:38 am Post subject: |
|
|
Newbie
Joined: 16 Jun 2005 Posts: 7 Location: Toronto, Canada
|
mq_crazy wrote: |
Did u check whether the QM1 is still in the cluster?? Maybe you can do a REFRESH CLUSTER and see. |
Yep, tried that. It refreshed it but here's the message I got:
------------------------------------------------------------------------
06/15/05 19:33:54
AMQ9418: Only one repository for cluster EQFX.LH.BELL.PRD.
EXPLANATION:
The queue manager has received information about a cluster for which it is the only repository.
ACTION:
Alter the REPOS or REPOSNL attribute of the queue manager, that is to have the second full repository for the cluster, to specify the cluster name.
-------------------------------------------------------------------------------
06/15/05 19:33:54
AMQ9442: Refresh Cluster command processed.
EXPLANATION:
The Refresh Cluster(EQFX.LH.BELL.PRD) command caused 9 objects to be refreshed and 3 objects to be republished.
ACTION:
None.
-------------------------------------------------------------------------------
Does this mean something with the repository is messed up? QM1 and QM0 are full repositories and QM2 is partial.
The other weird thing is this message:
06/15/05 19:36:03
AMQ9202: Remote host '204.19.XXX.YY (1414)' not available, retry later.
(Note that I put the X's and Y's in there just now for posting purposes only.) QM1 isn't supposed to know about this IP address, QM2 is. QM1 is supposed to be using 207.107.xxx.yy ...could this be an alias problem? |
|
Back to top |
|
 |
mq_crazy |
Posted: Thu Jun 16, 2005 11:07 am Post subject: |
|
|
 Master
Joined: 30 Jun 2004 Posts: 295
|
I think you have only one repository. According to the message, there is no second full repository. Check those and make it full repository by altering the queue manager and try it again |
|
Back to top |
|
 |
EddieA |
Posted: Thu Jun 16, 2005 12:58 pm Post subject: |
|
|
 Jedi
Joined: 28 Jun 2001 Posts: 2453 Location: Los Angeles
|
Quote: |
QM1 isn't supposed to know about this IP address, QM2 is. QM1 is supposed to be using 207.107.xxx.yy |
Well, that depends. Without more information about which IP belongs to which QM, and on which QM the message appeared it's difficult to say if this is valid or not. Don't forget, in a cluster, MQ will define channels between QMs as needed.
Cheers, _________________ Eddie Atherton
IBM Certified Solution Developer - WebSphere Message Broker V6.1
IBM Certified Solution Developer - WebSphere Message Broker V7.0 |
|
Back to top |
|
 |
fjb_saper |
Posted: Thu Jun 16, 2005 1:45 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
go to your official full repository qm0
do runmqsc
dis clusqmgr(*)
This will give you the official list of qmgrs in the cluster.
If qmgr1 does not appear in that list you may have to have it join the cluster again.
Make sure you get 2 full repositories in your cluster...
Enjoy  |
|
Back to top |
|
 |
PeterPotkay |
Posted: Thu Jun 16, 2005 3:21 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Most cluster problems are just plain old channel problems. Make sure the CLUSRCVR channel on QM1 is correct. Make sure the listener is running.
On QM0, do you see any retrying channels to QM1?
Try this little test. I asuume your cluster name is CLUSTERA:
On QM1 and QM2, create a queue called PETER. On both QMs, cluster it to CLUSTERA.
Connect to QM0, and do an MQPUT to PETER. Do 10. If they round robin, then there is a problem with your original queue on QM1. Is it PUT_INHIBITED? Does it exist even? Is it still clustered to CLUSTERA?
If all 10 go to only PETER on QM2, then try this. Connect to QM0, and put 10 messages to PETER / QM1 (specify both the destination q and destination QM on the MQOPEN / MQPUT1). These should all go to QM1 only. If they get stuck in the Sustem.Cluster.Transmit.Queue on QM0, you know you have a channel problem from QM0 to QM1. If all 10 do make it to PETER on QM1, then I bet QM1 is SUSPENDED from CLUSTERA. In that case on QM1, issue the RESUME command.
Try the above tests in the exact order I mentioned, and I am sure it will point you to the problem.
-Peter _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
gstephen |
Posted: Fri Jun 17, 2005 5:29 am Post subject: |
|
|
Newbie
Joined: 16 Jun 2005 Posts: 7 Location: Toronto, Canada
|
PeterPotkay wrote: |
Most cluster problems are just plain old channel problems. Make sure the CLUSRCVR channel on QM1 is correct. Make sure the listener is running.
On QM0, do you see any retrying channels to QM1?
Try this little test. I asuume your cluster name is CLUSTERA:
On QM1 and QM2, create a queue called PETER. On both QMs, cluster it to CLUSTERA.
Connect to QM0, and do an MQPUT to PETER. Do 10. If they round robin, then there is a problem with your original queue on QM1. Is it PUT_INHIBITED? Does it exist even? Is it still clustered to CLUSTERA?
If all 10 go to only PETER on QM2, then try this. Connect to QM0, and put 10 messages to PETER / QM1 (specify both the destination q and destination QM on the MQOPEN / MQPUT1). These should all go to QM1 only. If they get stuck in the Sustem.Cluster.Transmit.Queue on QM0, you know you have a channel problem from QM0 to QM1. If all 10 do make it to PETER on QM1, then I bet QM1 is SUSPENDED from CLUSTERA. In that case on QM1, issue the RESUME command.
Try the above tests in the exact order I mentioned, and I am sure it will point you to the problem.
-Peter |
Wow, thanks Peter. I will give this a try. The only thing is that I'm on vacation starting next week so I will have to wait until I return. QM2 will handle everything until then. Thanks again, I will follow up in a couple of weeks. |
|
Back to top |
|
 |
hguapluas |
Posted: Fri Jun 17, 2005 9:04 am Post subject: |
|
|
Centurion
Joined: 05 Aug 2004 Posts: 105 Location: San Diego
|
FYI, frequently, immediately after you do a Cluster Refresh, you will get the Only One Repository message in you check right away. It sometimes takes a few moments/minutes for the second repository to be acknowledged in a cluster as being available after the refresh. I find this happens a lot and you just have to wait a few minutes to allow the refresh and cluster repositories to do their job before confirming FR status on both FRs.
The other option is to use your AMQ commands and check each FR from command line. You will always get an accurate report of whether or not the QMs are FRs or PRs from the command line.
Cheers, |
|
Back to top |
|
 |
PeterPotkay |
Posted: Fri Jun 17, 2005 2:34 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
If you have a tool that displays all your cluster system queues depths in real time, you will see that if you issue a REFRESH cluster on a QM, it will take 10-20 seconds before all the command queues empty and all the transmit queues empty. It takes a while and cause a lot of messages to start moving around.
Having said that, in my opinion, REFRESH Cluster is recomended / used 100 times more than it should. REFRESH cluster will not fix bad channel defs, or RESUME QMs, or start listeners, or fix network problems, or PUT enable queues, or etc. 99% of the time you will get more results slapping the side of your monitor.
All REFRESH does is (simplified terms) tag every entry in the Cluster Repository queue as obsolete, repopulates the local repository queue with ONLY the local clustered definitions that that QM owns, and send this info to the FR. If you think this will fix the problem, go for it, but if you carefully read what it actually does, you will see that 99% of the time this will do absolutly nothing for you. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
gstephen |
Posted: Wed Jul 06, 2005 6:37 am Post subject: |
|
|
Newbie
Joined: 16 Jun 2005 Posts: 7 Location: Toronto, Canada
|
PeterPotkay wrote: |
If you have a tool that displays all your cluster system queues depths in real time, you will see that if you issue a REFRESH cluster on a QM, it will take 10-20 seconds before all the command queues empty and all the transmit queues empty. It takes a while and cause a lot of messages to start moving around.
Having said that, in my opinion, REFRESH Cluster is recomended / used 100 times more than it should. REFRESH cluster will not fix bad channel defs, or RESUME QMs, or start listeners, or fix network problems, or PUT enable queues, or etc. 99% of the time you will get more results slapping the side of your monitor.
All REFRESH does is (simplified terms) tag every entry in the Cluster Repository queue as obsolete, repopulates the local repository queue with ONLY the local clustered definitions that that QM owns, and send this info to the FR. If you think this will fix the problem, go for it, but if you carefully read what it actually does, you will see that 99% of the time this will do absolutly nothing for you. |
Hi all. I'm back from vacation and trying again to get this thing working. As it stands now, thanks to you I discovered that QM1 was indeed suspended, as you pointed out, I was fooled by the refresh command which does not take it out of suspend mode, hence why I was getting no messages.
Now I'm back to the problem I had in the first place (months ago) where when I send a message I get "MQJE001: Completion Code 2, Reason 2085" in my application log. Here's the strange part, in the MQ error log I get a message saying "AMQ9202: Remote host '204.a.b.c (1414)' not available, retry later." (that's me that put the a.b.c in, but you get the idea). Now, QM1 is supposed to send messages to 207.e.f.g ...so is it possible that it's using the wrong channel? How do I tell what channel it's actually using? When I do a 'dis channel(*) all' the 207 channel is listed but the 204 channel is not. How does it even know about the 204 channel? It's supposed to be using 207.
Help?!
Graham |
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Jul 06, 2005 12:37 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
The definition you need to check is in the cluster receiver chl.
As well you should make sure that the ip/port are the same all throughout the cluster network. May be replace the IP with the name.
I do not know how the network / channels would react if
qm0 to qm2 ip=207.a.b.c
qm1 to qm2 ip=204.f.g.h
This could put a serious cranck into the works...
Enjoy  |
|
Back to top |
|
 |
PeterPotkay |
Posted: Wed Jul 06, 2005 2:46 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
How about listing the QMs in question, who is a PR or a FR, the ports they are listening on, their IPs, their CLUSRCVR defs, their CLUSSNDR defs, exactly which QM your app is connected to, and what q is it opening and getting a 2085 on. Also, post the def of that target q. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
gstephen |
Posted: Thu Jul 07, 2005 6:38 am Post subject: |
|
|
Newbie
Joined: 16 Jun 2005 Posts: 7 Location: Toronto, Canada
|
PeterPotkay wrote: |
How about listing the QMs in question, who is a PR or a FR, the ports they are listening on, their IPs, their CLUSRCVR defs, their CLUSSNDR defs, exactly which QM your app is connected to, and what q is it opening and getting a 2085 on. Also, post the def of that target q. |
Ok, the short of it is like this:
QM0 (name is actually MEMP) is FR I think - this one is not under my control but I can call the guy that looks after it.
QM1 if FR (142.117.a.b) - sends requests to QM0 on 207.a.b.c. Currently getting 2085 error. I have QM1 currently suspended.
QM2 is PR (142.182.x.y) - sends requests to QM0 on 204.e.f.g. Working fine.
Here's some details taken from our buildbook for QM1:
DEFINE CHANNEL(TO.GTWAY.EQFX.QMP1) CHLTYPE(CLUSRCVR) CLUSTER(EQFX.LH.BELL.PRD) CONNAME('142.117.a.b(1417)') TRPTYPE(TCP) DESCR('Cluster-receiver channel')
DEFINE CHANNEL(TO.GTWAY.EQFX.QMP2) CHLTYPE(CLUSSDR) CLUSTER(EQFX.LH.BELL.PRD) CONNAME('142.182.x.y(1418)') TRPTYPE(TCP) DESCR('Cluster-sender channel')
DEFINE CHANNEL(TO.MEMP.01) CHLTYPE(CLUSSDR) CLUSTER(EQFX.LH.BELL.PRD) CONNAME('207.a.b.c(1414)') TRPTYPE(TCP) DESCR('Cluster-sender channel to Equifax')
There is an instance of the same application on both the QM1 and QM2 and they work independently (Weblogic).
Taken from one of the application properties files:
EQFX.Equifax.request.queue=PK.CICS.BELL.CDTSCR.RQST.01.ALIAS
EQFX.Gateway.queueManager=GTWAY.EQFX.QMP1
EQFX.Gateway.channel=SYSTEM.DEF.SVRCONN
EQFX.Gateway.host=142.117.a.b
EQFX.Gateway.port=1417
EQFX.Equifax.user=MQPROD
EQFX.Gateway.deadletter.queue=LH.EQFX.DEAD.QUEUE
EQFX.Gateway.reply.queue=EQFX.GTWAY.QUEUE.PRD
What's the alias is for? We haven't made any config changes whatsoever, either to this file or elsewhere.
From QM1:
dis clusqmgr(*) status qmtype
2 : dis clusqmgr(*) status qmtype
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(GTWAY.EQFX.QMP1) CLUSTER(EQFX.LH.BELL.PRD)
CHANNEL(TO.GTWAY.EQFX.QMP1) QMTYPE(REPOS)
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(GTWAY.EQFX.QMP2) CLUSTER(EQFX.LH.BELL.PRD)
CHANNEL(TO.GTWAY.EQFX.QMP2) QMTYPE(REPOS)
STATUS(INACTIVE)
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(MEMP) CLUSTER(EQFX.LH.BELL.PRD)
CHANNEL(TO.MEMP.01) QMTYPE(NORMAL)
STATUS(INACTIVE)
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(MEMP) CLUSTER(EQFX.LH.BELL.PRD)
CHANNEL(TO.MEMP.02) QMTYPE(NORMAL)
STATUS(STOPPED)
From QM1:
dis chstatus(*) all
10 : dis chstatus(*) all
AMQ8417: Display Channel Status details.
CHANNEL(TO.MEMP.02) XMITQ(SYSTEM.CLUSTER.TRANSMIT.QUEUE)
CONNAME(204.19.233.65(1414)) CURRENT
CHLTYPE(CLUSSDR) INDOUBT(NO)
LSTSEQNO(0) LSTLUWID(0000000000000000)
CURMSGS(0) CURSEQNO(0)
CURLUWID(0000000000000000) STATUS(STOPPED)
LSTMSGTI( ) LSTMSGDA( )
MSGS(0) BYTSSENT(0)
BYTSRCVD(0) BATCHES(0)
BATCHSZ(10) HBINT(180)
NPMSPEED(FAST) CHSTATI(05.35.05)
CHSTADA(2005-07-06) BUFSSENT(0)
BUFSRCVD(0) LONGRTS(0)
SHORTRTS(0) JOBNAME(00006062000000B6)
MCASTAT(NOT RUNNING) STOPREQ(NO)
LOCLADDR() SSLPEER()
RQMNAME()
Notice that each QM thinks one of the channels is stopped, although QM2 is working fine?! The firewall is set to allow traffic between QM1 and 207.e.f.g, and between QM2 and 204.a.b.c. I can ping 204.a.b.c from QM2, but I get no response when I ping 207 from QM1 even though TO.MEMP.01 is connected?!
From QM2:
>> dis chstatus(*)
AMQ8417: Display Channel Status details.
CHANNEL(TO.MEMP.01) XMITQ(SYSTEM.CLUSTER.TRANSMIT.QUEUE)
CONNAME(207.107.68.65(1414)) CURRENT
CHLTYPE(CLUSSDR) INDOUBT(NO)
LSTSEQNO(0) LSTLUWID(0000000000000000)
CURMSGS(0) CURSEQNO(0)
CURLUWID(0000000000000000) STATUS(STOPPED)
LSTMSGTI( ) LSTMSGDA( )
MSGS(0) BYTSSENT(0)
BYTSRCVD(0) BATCHES(0)
BATCHSZ(10) HBINT(180)
NPMSPEED(FAST) CHSTATI(05.53.41)
CHSTADA(2005-06-14) BUFSSENT(0)
BUFSRCVD(0) LONGRTS(0)
SHORTRTS(0) JOBNAME(0000443000001219)
MCASTAT(NOT RUNNING) STOPREQ(NO)
LOCLADDR() SSLPEER()
RQMNAME()
From QM1:
dis qcluster(*) all
AMQ8409: Display Queue details.
DESCR(WebSphere MQ Default Local Queue)
CLUSTER(EQFX.LH.BELL.PRD) QUEUE(EQFX.GTWAY.QUEUE.PRD)
CLUSQMGR(GTWAY.EQFX.QMP2)
QMID(GTWAY.EQFX.QMP2_2004-04-13_16.05.55)
CLUSDATE(2005-07-05) CLUSTIME(19.24.57)
ALTDATE(2004-04-13) ALTTIME(16.10.07)
CLUSQT(QLOCAL) TYPE(QCLUSTER)
PUT(ENABLED) DEFPRTY(0)
DEFPSIST(YES) DEFBIND(NOTFIXED)
AMQ8409: Display Queue details.
DESCR(WebSphere MQ Default Local Queue)
CLUSTER(EQFX.LH.BELL.PRD) QUEUE(EQFX.GTWAY.QUEUE.PRD)
CLUSQMGR(GTWAY.EQFX.QMP1)
QMID(GTWAY.EQFX.QMP1_2004-04-13_15.40.49)
CLUSDATE(2004-04-13) CLUSTIME(16.04.44)
ALTDATE(2004-04-13) ALTTIME(16.04.43)
CLUSQT(QLOCAL) TYPE(QCLUSTER)
PUT(ENABLED) DEFPRTY(0)
DEFPSIST(YES) |
|
Back to top |
|
 |
fjb_saper |
Posted: Thu Jul 07, 2005 1:23 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
To tell you the truth I don't like using IP too much especially if you traverse firewalls and nat...
I 'd rather use the IP name. This way you can have following:
qmgr0 ipname cl0
qmgr1 ipname cl1
qmgr2 ipname cl2
on ClusterReceiver specify cl1(port)...
now
from cl0 cl1 = 204.xx.xx.xx
from cl2 cl1 = 147.xx.xx.xx
from cl1 cl1 = 192.xx.xx.xx
This would resolve nat traversal and other problems but everybody has to play nice with their name servers (and not recomended hosts files).
Enjoy  |
|
Back to top |
|
 |
|
|
 |
Goto page 1, 2 Next |
Page 1 of 2 |
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|