ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » Clustering » Newbie desperate for help! QM not receiving messages

Post new topic  Reply to topic Goto page 1, 2  Next
 Newbie desperate for help! QM not receiving messages « View previous topic :: View next topic » 
Author Message
gstephen
PostPosted: Thu Jun 16, 2005 5:00 am    Post subject: Newbie desperate for help! QM not receiving messages Reply with quote

Newbie

Joined: 16 Jun 2005
Posts: 7
Location: Toronto, Canada

Hi all,

I'm new to MQ so be gentle please!

I have a cluster with 3 QMs. Messages originate from QM0 and get sent to QM1 and QM2 in round-robin fashion to share the load. Well, after applying some patches to Solaris, QM1 no longer seems to be receiving messages, they keep going to QM2, which is good since it means the system still works at 50% and QM2 can handle the full load, but it's not supposed to work like this. So, I've had QM1 turned off for weeks and I turn it on periodically some evenings to try to get it working but no matter what I've tried, messages keep going to only QM2.

Is there some generic reason why this might be happening? I bet it's something really simple that I'm just not seeing because I'm new at this. All the channels seem to be running.

Any help would be greatly appreciated!

Graham
Back to top
View user's profile Send private message
mq_crazy
PostPosted: Thu Jun 16, 2005 6:08 am    Post subject: Reply with quote

Master

Joined: 30 Jun 2004
Posts: 295

Did u check whether the QM1 is still in the cluster?? Maybe you can do a REFRESH CLUSTER and see.
Back to top
View user's profile Send private message
gstephen
PostPosted: Thu Jun 16, 2005 10:38 am    Post subject: Reply with quote

Newbie

Joined: 16 Jun 2005
Posts: 7
Location: Toronto, Canada

mq_crazy wrote:
Did u check whether the QM1 is still in the cluster?? Maybe you can do a REFRESH CLUSTER and see.


Yep, tried that. It refreshed it but here's the message I got:

------------------------------------------------------------------------
06/15/05 19:33:54
AMQ9418: Only one repository for cluster EQFX.LH.BELL.PRD.

EXPLANATION:
The queue manager has received information about a cluster for which it is the only repository.
ACTION:
Alter the REPOS or REPOSNL attribute of the queue manager, that is to have the second full repository for the cluster, to specify the cluster name.
-------------------------------------------------------------------------------
06/15/05 19:33:54
AMQ9442: Refresh Cluster command processed.

EXPLANATION:
The Refresh Cluster(EQFX.LH.BELL.PRD) command caused 9 objects to be refreshed and 3 objects to be republished.
ACTION:
None.
-------------------------------------------------------------------------------

Does this mean something with the repository is messed up? QM1 and QM0 are full repositories and QM2 is partial.

The other weird thing is this message:

06/15/05 19:36:03
AMQ9202: Remote host '204.19.XXX.YY (1414)' not available, retry later.

(Note that I put the X's and Y's in there just now for posting purposes only.) QM1 isn't supposed to know about this IP address, QM2 is. QM1 is supposed to be using 207.107.xxx.yy ...could this be an alias problem?
Back to top
View user's profile Send private message
mq_crazy
PostPosted: Thu Jun 16, 2005 11:07 am    Post subject: Reply with quote

Master

Joined: 30 Jun 2004
Posts: 295

I think you have only one repository. According to the message, there is no second full repository. Check those and make it full repository by altering the queue manager and try it again
Back to top
View user's profile Send private message
EddieA
PostPosted: Thu Jun 16, 2005 12:58 pm    Post subject: Reply with quote

Jedi

Joined: 28 Jun 2001
Posts: 2453
Location: Los Angeles

Quote:
QM1 isn't supposed to know about this IP address, QM2 is. QM1 is supposed to be using 207.107.xxx.yy

Well, that depends. Without more information about which IP belongs to which QM, and on which QM the message appeared it's difficult to say if this is valid or not. Don't forget, in a cluster, MQ will define channels between QMs as needed.

Cheers,
_________________
Eddie Atherton
IBM Certified Solution Developer - WebSphere Message Broker V6.1
IBM Certified Solution Developer - WebSphere Message Broker V7.0
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Thu Jun 16, 2005 1:45 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

go to your official full repository qm0
do runmqsc
dis clusqmgr(*)

This will give you the official list of qmgrs in the cluster.
If qmgr1 does not appear in that list you may have to have it join the cluster again.

Make sure you get 2 full repositories in your cluster...

Enjoy
Back to top
View user's profile Send private message Send e-mail
PeterPotkay
PostPosted: Thu Jun 16, 2005 3:21 pm    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

Most cluster problems are just plain old channel problems. Make sure the CLUSRCVR channel on QM1 is correct. Make sure the listener is running.

On QM0, do you see any retrying channels to QM1?

Try this little test. I asuume your cluster name is CLUSTERA:

On QM1 and QM2, create a queue called PETER. On both QMs, cluster it to CLUSTERA.

Connect to QM0, and do an MQPUT to PETER. Do 10. If they round robin, then there is a problem with your original queue on QM1. Is it PUT_INHIBITED? Does it exist even? Is it still clustered to CLUSTERA?

If all 10 go to only PETER on QM2, then try this. Connect to QM0, and put 10 messages to PETER / QM1 (specify both the destination q and destination QM on the MQOPEN / MQPUT1). These should all go to QM1 only. If they get stuck in the Sustem.Cluster.Transmit.Queue on QM0, you know you have a channel problem from QM0 to QM1. If all 10 do make it to PETER on QM1, then I bet QM1 is SUSPENDED from CLUSTERA. In that case on QM1, issue the RESUME command.

Try the above tests in the exact order I mentioned, and I am sure it will point you to the problem.

-Peter
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
gstephen
PostPosted: Fri Jun 17, 2005 5:29 am    Post subject: Reply with quote

Newbie

Joined: 16 Jun 2005
Posts: 7
Location: Toronto, Canada

PeterPotkay wrote:
Most cluster problems are just plain old channel problems. Make sure the CLUSRCVR channel on QM1 is correct. Make sure the listener is running.

On QM0, do you see any retrying channels to QM1?

Try this little test. I asuume your cluster name is CLUSTERA:

On QM1 and QM2, create a queue called PETER. On both QMs, cluster it to CLUSTERA.

Connect to QM0, and do an MQPUT to PETER. Do 10. If they round robin, then there is a problem with your original queue on QM1. Is it PUT_INHIBITED? Does it exist even? Is it still clustered to CLUSTERA?

If all 10 go to only PETER on QM2, then try this. Connect to QM0, and put 10 messages to PETER / QM1 (specify both the destination q and destination QM on the MQOPEN / MQPUT1). These should all go to QM1 only. If they get stuck in the Sustem.Cluster.Transmit.Queue on QM0, you know you have a channel problem from QM0 to QM1. If all 10 do make it to PETER on QM1, then I bet QM1 is SUSPENDED from CLUSTERA. In that case on QM1, issue the RESUME command.

Try the above tests in the exact order I mentioned, and I am sure it will point you to the problem.

-Peter


Wow, thanks Peter. I will give this a try. The only thing is that I'm on vacation starting next week so I will have to wait until I return. QM2 will handle everything until then. Thanks again, I will follow up in a couple of weeks.
Back to top
View user's profile Send private message
hguapluas
PostPosted: Fri Jun 17, 2005 9:04 am    Post subject: Reply with quote

Centurion

Joined: 05 Aug 2004
Posts: 105
Location: San Diego

FYI, frequently, immediately after you do a Cluster Refresh, you will get the Only One Repository message in you check right away. It sometimes takes a few moments/minutes for the second repository to be acknowledged in a cluster as being available after the refresh. I find this happens a lot and you just have to wait a few minutes to allow the refresh and cluster repositories to do their job before confirming FR status on both FRs.

The other option is to use your AMQ commands and check each FR from command line. You will always get an accurate report of whether or not the QMs are FRs or PRs from the command line.

Cheers,
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Fri Jun 17, 2005 2:34 pm    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

If you have a tool that displays all your cluster system queues depths in real time, you will see that if you issue a REFRESH cluster on a QM, it will take 10-20 seconds before all the command queues empty and all the transmit queues empty. It takes a while and cause a lot of messages to start moving around.

Having said that, in my opinion, REFRESH Cluster is recomended / used 100 times more than it should. REFRESH cluster will not fix bad channel defs, or RESUME QMs, or start listeners, or fix network problems, or PUT enable queues, or etc. 99% of the time you will get more results slapping the side of your monitor.

All REFRESH does is (simplified terms) tag every entry in the Cluster Repository queue as obsolete, repopulates the local repository queue with ONLY the local clustered definitions that that QM owns, and send this info to the FR. If you think this will fix the problem, go for it, but if you carefully read what it actually does, you will see that 99% of the time this will do absolutly nothing for you.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
gstephen
PostPosted: Wed Jul 06, 2005 6:37 am    Post subject: Reply with quote

Newbie

Joined: 16 Jun 2005
Posts: 7
Location: Toronto, Canada

PeterPotkay wrote:
If you have a tool that displays all your cluster system queues depths in real time, you will see that if you issue a REFRESH cluster on a QM, it will take 10-20 seconds before all the command queues empty and all the transmit queues empty. It takes a while and cause a lot of messages to start moving around.

Having said that, in my opinion, REFRESH Cluster is recomended / used 100 times more than it should. REFRESH cluster will not fix bad channel defs, or RESUME QMs, or start listeners, or fix network problems, or PUT enable queues, or etc. 99% of the time you will get more results slapping the side of your monitor.

All REFRESH does is (simplified terms) tag every entry in the Cluster Repository queue as obsolete, repopulates the local repository queue with ONLY the local clustered definitions that that QM owns, and send this info to the FR. If you think this will fix the problem, go for it, but if you carefully read what it actually does, you will see that 99% of the time this will do absolutly nothing for you.


Hi all. I'm back from vacation and trying again to get this thing working. As it stands now, thanks to you I discovered that QM1 was indeed suspended, as you pointed out, I was fooled by the refresh command which does not take it out of suspend mode, hence why I was getting no messages.

Now I'm back to the problem I had in the first place (months ago) where when I send a message I get "MQJE001: Completion Code 2, Reason 2085" in my application log. Here's the strange part, in the MQ error log I get a message saying "AMQ9202: Remote host '204.a.b.c (1414)' not available, retry later." (that's me that put the a.b.c in, but you get the idea). Now, QM1 is supposed to send messages to 207.e.f.g ...so is it possible that it's using the wrong channel? How do I tell what channel it's actually using? When I do a 'dis channel(*) all' the 207 channel is listed but the 204 channel is not. How does it even know about the 204 channel? It's supposed to be using 207.

Help?!

Graham
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Wed Jul 06, 2005 12:37 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

The definition you need to check is in the cluster receiver chl.
As well you should make sure that the ip/port are the same all throughout the cluster network. May be replace the IP with the name.

I do not know how the network / channels would react if
qm0 to qm2 ip=207.a.b.c
qm1 to qm2 ip=204.f.g.h

This could put a serious cranck into the works...

Enjoy
Back to top
View user's profile Send private message Send e-mail
PeterPotkay
PostPosted: Wed Jul 06, 2005 2:46 pm    Post subject: Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

How about listing the QMs in question, who is a PR or a FR, the ports they are listening on, their IPs, their CLUSRCVR defs, their CLUSSNDR defs, exactly which QM your app is connected to, and what q is it opening and getting a 2085 on. Also, post the def of that target q.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
gstephen
PostPosted: Thu Jul 07, 2005 6:38 am    Post subject: Reply with quote

Newbie

Joined: 16 Jun 2005
Posts: 7
Location: Toronto, Canada

PeterPotkay wrote:
How about listing the QMs in question, who is a PR or a FR, the ports they are listening on, their IPs, their CLUSRCVR defs, their CLUSSNDR defs, exactly which QM your app is connected to, and what q is it opening and getting a 2085 on. Also, post the def of that target q.


Ok, the short of it is like this:

QM0 (name is actually MEMP) is FR I think - this one is not under my control but I can call the guy that looks after it.
QM1 if FR (142.117.a.b) - sends requests to QM0 on 207.a.b.c. Currently getting 2085 error. I have QM1 currently suspended.
QM2 is PR (142.182.x.y) - sends requests to QM0 on 204.e.f.g. Working fine.

Here's some details taken from our buildbook for QM1:

DEFINE CHANNEL(TO.GTWAY.EQFX.QMP1) CHLTYPE(CLUSRCVR) CLUSTER(EQFX.LH.BELL.PRD) CONNAME('142.117.a.b(1417)') TRPTYPE(TCP) DESCR('Cluster-receiver channel')

DEFINE CHANNEL(TO.GTWAY.EQFX.QMP2) CHLTYPE(CLUSSDR) CLUSTER(EQFX.LH.BELL.PRD) CONNAME('142.182.x.y(1418)') TRPTYPE(TCP) DESCR('Cluster-sender channel')

DEFINE CHANNEL(TO.MEMP.01) CHLTYPE(CLUSSDR) CLUSTER(EQFX.LH.BELL.PRD) CONNAME('207.a.b.c(1414)') TRPTYPE(TCP) DESCR('Cluster-sender channel to Equifax')

There is an instance of the same application on both the QM1 and QM2 and they work independently (Weblogic).

Taken from one of the application properties files:

EQFX.Equifax.request.queue=PK.CICS.BELL.CDTSCR.RQST.01.ALIAS
EQFX.Gateway.queueManager=GTWAY.EQFX.QMP1
EQFX.Gateway.channel=SYSTEM.DEF.SVRCONN
EQFX.Gateway.host=142.117.a.b
EQFX.Gateway.port=1417
EQFX.Equifax.user=MQPROD
EQFX.Gateway.deadletter.queue=LH.EQFX.DEAD.QUEUE
EQFX.Gateway.reply.queue=EQFX.GTWAY.QUEUE.PRD

What's the alias is for? We haven't made any config changes whatsoever, either to this file or elsewhere.

From QM1:

dis clusqmgr(*) status qmtype
2 : dis clusqmgr(*) status qmtype
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(GTWAY.EQFX.QMP1) CLUSTER(EQFX.LH.BELL.PRD)
CHANNEL(TO.GTWAY.EQFX.QMP1) QMTYPE(REPOS)
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(GTWAY.EQFX.QMP2) CLUSTER(EQFX.LH.BELL.PRD)
CHANNEL(TO.GTWAY.EQFX.QMP2) QMTYPE(REPOS)
STATUS(INACTIVE)
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(MEMP) CLUSTER(EQFX.LH.BELL.PRD)
CHANNEL(TO.MEMP.01) QMTYPE(NORMAL)
STATUS(INACTIVE)
AMQ8441: Display Cluster Queue Manager details.
CLUSQMGR(MEMP) CLUSTER(EQFX.LH.BELL.PRD)
CHANNEL(TO.MEMP.02) QMTYPE(NORMAL)
STATUS(STOPPED)

From QM1:

dis chstatus(*) all
10 : dis chstatus(*) all
AMQ8417: Display Channel Status details.
CHANNEL(TO.MEMP.02) XMITQ(SYSTEM.CLUSTER.TRANSMIT.QUEUE)
CONNAME(204.19.233.65(1414)) CURRENT
CHLTYPE(CLUSSDR) INDOUBT(NO)
LSTSEQNO(0) LSTLUWID(0000000000000000)
CURMSGS(0) CURSEQNO(0)
CURLUWID(0000000000000000) STATUS(STOPPED)
LSTMSGTI( ) LSTMSGDA( )
MSGS(0) BYTSSENT(0)
BYTSRCVD(0) BATCHES(0)
BATCHSZ(10) HBINT(180)
NPMSPEED(FAST) CHSTATI(05.35.05)
CHSTADA(2005-07-06) BUFSSENT(0)
BUFSRCVD(0) LONGRTS(0)
SHORTRTS(0) JOBNAME(00006062000000B6)
MCASTAT(NOT RUNNING) STOPREQ(NO)
LOCLADDR() SSLPEER()
RQMNAME()


Notice that each QM thinks one of the channels is stopped, although QM2 is working fine?! The firewall is set to allow traffic between QM1 and 207.e.f.g, and between QM2 and 204.a.b.c. I can ping 204.a.b.c from QM2, but I get no response when I ping 207 from QM1 even though TO.MEMP.01 is connected?!


From QM2:

>> dis chstatus(*)
AMQ8417: Display Channel Status details.
CHANNEL(TO.MEMP.01) XMITQ(SYSTEM.CLUSTER.TRANSMIT.QUEUE)
CONNAME(207.107.68.65(1414)) CURRENT
CHLTYPE(CLUSSDR) INDOUBT(NO)
LSTSEQNO(0) LSTLUWID(0000000000000000)
CURMSGS(0) CURSEQNO(0)
CURLUWID(0000000000000000) STATUS(STOPPED)
LSTMSGTI( ) LSTMSGDA( )
MSGS(0) BYTSSENT(0)
BYTSRCVD(0) BATCHES(0)
BATCHSZ(10) HBINT(180)
NPMSPEED(FAST) CHSTATI(05.53.41)
CHSTADA(2005-06-14) BUFSSENT(0)
BUFSRCVD(0) LONGRTS(0)
SHORTRTS(0) JOBNAME(0000443000001219)
MCASTAT(NOT RUNNING) STOPREQ(NO)
LOCLADDR() SSLPEER()
RQMNAME()

From QM1:

dis qcluster(*) all
AMQ8409: Display Queue details.
DESCR(WebSphere MQ Default Local Queue)
CLUSTER(EQFX.LH.BELL.PRD) QUEUE(EQFX.GTWAY.QUEUE.PRD)
CLUSQMGR(GTWAY.EQFX.QMP2)
QMID(GTWAY.EQFX.QMP2_2004-04-13_16.05.55)
CLUSDATE(2005-07-05) CLUSTIME(19.24.57)
ALTDATE(2004-04-13) ALTTIME(16.10.07)
CLUSQT(QLOCAL) TYPE(QCLUSTER)
PUT(ENABLED) DEFPRTY(0)
DEFPSIST(YES) DEFBIND(NOTFIXED)
AMQ8409: Display Queue details.
DESCR(WebSphere MQ Default Local Queue)
CLUSTER(EQFX.LH.BELL.PRD) QUEUE(EQFX.GTWAY.QUEUE.PRD)
CLUSQMGR(GTWAY.EQFX.QMP1)
QMID(GTWAY.EQFX.QMP1_2004-04-13_15.40.49)
CLUSDATE(2004-04-13) CLUSTIME(16.04.44)
ALTDATE(2004-04-13) ALTTIME(16.04.43)
CLUSQT(QLOCAL) TYPE(QCLUSTER)
PUT(ENABLED) DEFPRTY(0)
DEFPSIST(YES)
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Thu Jul 07, 2005 1:23 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

To tell you the truth I don't like using IP too much especially if you traverse firewalls and nat...
I 'd rather use the IP name. This way you can have following:

qmgr0 ipname cl0
qmgr1 ipname cl1
qmgr2 ipname cl2

on ClusterReceiver specify cl1(port)...
now
from cl0 cl1 = 204.xx.xx.xx
from cl2 cl1 = 147.xx.xx.xx
from cl1 cl1 = 192.xx.xx.xx

This would resolve nat traversal and other problems but everybody has to play nice with their name servers (and not recomended hosts files).

Enjoy
Back to top
View user's profile Send private message Send e-mail
Display posts from previous:   
Post new topic  Reply to topic Goto page 1, 2  Next Page 1 of 2

MQSeries.net Forum Index » Clustering » Newbie desperate for help! QM not receiving messages
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.