Author |
Message
|
Mehrdad |
Posted: Tue Jan 08, 2008 10:28 am Post subject: Challenge Question - 01 / 2008 |
|
|
Master
Joined: 27 Feb 2004 Posts: 219 Location: Europe
|
We have had our first 2 winners for December '07 announced and a poll is being run for the last December Challenge category.
Now here comes the January '08 Challenge Question as submitted by a well trusted member of the greater Jedi family.
' In an MQ Cluster, channels *from* one of the Queue Managers in the cluster are all retrying. There are no issues with network connectivity between the servers - this is a hard fact.
Telnet to the receiving QMs to port 1414 works fine from this box. The problem QM seems just fine. So do the receiving QMs. Listeners are up and running on 1414 on all the receiving QMs. Max Channels has not hit on any of them. In fact, the receiving QMs' error logs are 100% void of any errors!. There is no SSL or Security Exits involved. The problem sending QM is running on Windows and listening on port 1415.
The error log of the sending QM is complaining that it can’t get a connection to "123.45.65.789(1415)" for the retrying channels, where 123.45.65.789 is the correct hostname/IP address of the destination servers.
You have double checked, triple checked, QUADRUPLE checked your CLUSSNDR and CLUSRCVR channel definitions. They all have the correct hostname/IP address in the CONNAME fields. Remember, there are no issues with network connectivity between the servers. There is no firewall involved. Just a few hours ago this cluster was just fine. Something changed in the MQ environment to cause this. '
What was that change made by a well meaning MQ Admin?
Answers are encouraged to be posted here, yet for the one(s) who would like to remain discrete some you can send to challengejan2008@cressida.info . |
|
Back to top |
|
 |
iceage |
Posted: Tue Jan 08, 2008 8:32 pm Post subject: |
|
|
 Acolyte
Joined: 12 Apr 2006 Posts: 68
|
This problem happens when CLUSRCVR was not defined with port and on sending queue manager (where CLUSSDR/CLUSSDRA is RETRYING) default MQ port is overridden to value other than 1414. In this case 1415 , hence the reason no errors on receiving QM.
Answer is MQ admin overwritten default port for SENDING qmgr. Like this
TCP:
port=1415
Hope no "discrete" ones creeped in ..  |
|
Back to top |
|
 |
atheek |
Posted: Tue Jan 08, 2008 9:41 pm Post subject: |
|
|
 Partisan
Joined: 01 Jun 2006 Posts: 327 Location: Sydney
|
In the sender qmgr the MQ Admin might have changed the port part alone in the CONNAME of CLUSSDR from 1414 to 1415 and there were no listeners running at port 1415 in the destination server specified in the hostname part of CONNAME. This change wont take effect on the fly. A channel restart should have happened after this change was made |
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Jan 09, 2008 4:55 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
The well meaning MQAdmin set up the Cluster receiver with the port of the sending qmgr and not the port of the receiving qmgr.
So if the sending and receiving qmgrs are Full Repositories the channel definitions (cluster sender channel) as in defined channels would be fine.
However the auto-defined channel that the cluster is trying to use has the wrong port and as such any connection to the target qmgr will ultimately fail...
Now this would be a minor problem in the cluster.
Worst case scenario:
The well meaning MQ Admin created another qmgr and had it join the cluster by running the MS03 script from an existing qmgr. He did not change (prior to loading) the cluster receiver channel name but may have changed its definition ( port)...
Thus you have 2 qmgrs with the same cluster receiver channel name in the cluster.
When a qmgr tries to send a message to either of the recipient it gets confused as to what definition to use...
This would be a major problem in the cluster... _________________ MQ & Broker admin |
|
Back to top |
|
 |
jefflowrey |
Posted: Wed Jan 09, 2008 4:57 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
Remember that this is a problem with creating or starting cluster sender channels to every other queue manager in the cluster.
Quote: |
channels *from* one of the Queue Managers |
_________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Jan 09, 2008 5:08 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Good point Jeff.
This brings me to NAT traversal rules and firewall rules.
The "one" qmgr is in a different network and even though the host name gets resolved correctly the ip/port either doesn't get translated properly (NAT) or the port is not open (firewall) for that one destination qmgr...
And the culprit is:
The channel (cluster receiver) has the IP in it. This IP is not valid for the one qmgr in the different network. Use the hostname! _________________ MQ & Broker admin
Last edited by fjb_saper on Wed Jan 09, 2008 5:14 am; edited 1 time in total |
|
Back to top |
|
 |
jefflowrey |
Posted: Wed Jan 09, 2008 5:11 am Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
Remember that network connectivity has been eliminated as a potential source of the problem. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
fjb_saper |
Posted: Wed Jan 09, 2008 5:22 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Network connectivity is fine.
The IP however does not point to the same host.
The IP for the qmgr in the other network is subject to NAT.
So all the other qmgrs in the SAME network are fine and communicate.
The host with the IP of the target manager may exist in the other network and may even have a program listening on the right port, or it may not exist at all.
But the IP in the other network needs to be different to reach the right qmgr.
You HAVE to use the hostname and DNS resolution to communicate correctly across networks. You CANNOT use the IP number. This is what the well meaning admin did when he defined the cluster receiver... and it has nothing to do with network connectivity... _________________ MQ & Broker admin |
|
Back to top |
|
 |
fjb_saper |
Posted: Sun Jan 20, 2008 2:24 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
Quote: |
There is no firewall involved. Just a few hours ago this cluster was just fine. Something changed in the MQ environment to cause this. |
It would much easier to diagnose if it would be Unix....
I'll go out on a limb and state that the listener is running under a user that has no authority / access to the qmgr... or that the cluster receiver mca has a user without authority...
Extreme case: the username looks right but the userid is different. (User was deleted and recreated). _________________ MQ & Broker admin |
|
Back to top |
|
 |
sunny_30 |
Posted: Sat Jan 26, 2008 7:09 pm Post subject: |
|
|
 Master
Joined: 03 Oct 2005 Posts: 258
|
The question says that the Destination-server/IP that hosts the receiving Qmgr is '123.45.65.789', the Rcvr-qmgr listener is running on port 1414 & the channels carry the right host-names, but doesnt say if the port#s are also verified on the cluster channels.
My guess is that the MQ-admin has changed the CLUSSDR definition on the sending Qmgr from port 1414 to 1415.
If the port# on the CLUSSDR channel of the sending-qmgr is reverted back to 1414, then the channel shd come out of the 'retrying' mode. |
|
Back to top |
|
 |
PeterPotkay |
Posted: Fri Feb 01, 2008 3:35 pm Post subject: |
|
|
 Poobah
Joined: 15 May 2001 Posts: 7722
|
Turns out iceage had the answer we were looking for. Poking around MQ Explorer in MQ 5.3 the MQ Admin saw a QM property called port # on the TCP/IP tab. "Ah," he said to himself, "this QM is listening on port 1415, so I should set this to 1415 as well. Now that that's done, its Miller time!"
A few hours later as the channels start going INACTIVE and then restarting due to new message traffic they all started bombing out. None of the cluster channel definitions had a port # appended to the hostname in the CONNAME field because all were aiming at QMs that were using the default port # of 1414.
(For those that suggested that the CLUSSNDR had its channel def changed you forget that the manually defined CLUSSNDR channel is only ever used once when the QM initially introduces itself to the cluster, or if you issue REFRESH CLUSTER REPOS(YES). Once a QM is running in a cluster, you could delete its manual CLUSSNDR channel and it would keep working just fine. You shouldn't do that by the way because sooner or later you might need to issue REFRESH CLUSTER REPOS(YES).)
Lesson learned: Always include the port # in your channel definitions, even if its 1414.
Still wondering: Why did IBM build this Port# parm? It's not that big a deal to append a port # on those relatively infrequent times that you define a new channel for a QM. _________________ Peter Potkay
Keep Calm and MQ On |
|
Back to top |
|
 |
Mehrdad |
Posted: Sat Feb 02, 2008 12:47 am Post subject: |
|
|
Master
Joined: 27 Feb 2004 Posts: 219 Location: Europe
|
iceage wrote: |
This problem happens when CLUSRCVR was not defined with port and on sending queue manager (where CLUSSDR/CLUSSDRA is RETRYING) default MQ port is overridden to value other than 1414. In this case 1415 , hence the reason no errors on receiving QM.
Answer is MQ admin overwritten default port for SENDING qmgr. Like this
TCP:
port=1415
Hope no "discrete" ones creeped in ..  |
iceage: you are the declared winner. please email your mailing address details to the admin id and the winning Mug will be sent to you. |
|
Back to top |
|
 |
|