MQSeries.net :: View topic - Challenge Question

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » Challenge Forum » Challenge Question - 01 / 2008

This forum is locked: you cannot post, reply to, or edit topics.

This topic is locked: you cannot edit posts or make replies.

Challenge Question - 01 / 2008

« View previous topic :: View next topic »

Author

Message

Mehrdad

Posted: Tue Jan 08, 2008 10:28 am Post subject: Challenge Question - 01 / 2008

Master

Joined: 27 Feb 2004
Posts: 219
Location: Europe

We have had our first 2 winners for December '07 announced and a poll is being run for the last December Challenge category.

Now here comes the January '08 Challenge Question as submitted by a well trusted member of the greater Jedi family.

' In an MQ Cluster, channels *from* one of the Queue Managers in the cluster are all retrying. There are no issues with network connectivity between the servers - this is a hard fact.

Telnet to the receiving QMs to port 1414 works fine from this box. The problem QM seems just fine. So do the receiving QMs. Listeners are up and running on 1414 on all the receiving QMs. Max Channels has not hit on any of them. In fact, the receiving QMs' error logs are 100% void of any errors!. There is no SSL or Security Exits involved. The problem sending QM is running on Windows and listening on port 1415.

The error log of the sending QM is complaining that it canâ€™t get a connection to "123.45.65.789(1415)" for the retrying channels, where 123.45.65.789 is the correct hostname/IP address of the destination servers.

You have double checked, triple checked, QUADRUPLE checked your CLUSSNDR and CLUSRCVR channel definitions. They all have the correct hostname/IP address in the CONNAME fields. Remember, there are no issues with network connectivity between the servers. There is no firewall involved. Just a few hours ago this cluster was just fine. Something changed in the MQ environment to cause this. '

What was that change made by a well meaning MQ Admin?

Answers are encouraged to be posted here, yet for the one(s) who would like to remain discrete some you can send to challengejan2008@cressida.info .

iceage

Posted: Tue Jan 08, 2008 8:32 pm Post subject:

Acolyte

Joined: 12 Apr 2006
Posts: 68

This problem happens when CLUSRCVR was not defined with port and on sending queue manager (where CLUSSDR/CLUSSDRA is RETRYING) default MQ port is overridden to value other than 1414. In this case 1415 , hence the reason no errors on receiving QM.

Answer is MQ admin overwritten default port for SENDING qmgr. Like this
TCP:
port=1415

Hope no "discrete" ones creeped in ..

atheek

Posted: Tue Jan 08, 2008 9:41 pm Post subject:

Partisan

Joined: 01 Jun 2006
Posts: 327
Location: Sydney

In the sender qmgr the MQ Admin might have changed the port part alone in the CONNAME of CLUSSDR from 1414 to 1415 and there were no listeners running at port 1415 in the destination server specified in the hostname part of CONNAME. This change wont take effect on the fly. A channel restart should have happened after this change was made

fjb_saper

Posted: Wed Jan 09, 2008 4:55 am Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20771
Location: LI,NY

The well meaning MQAdmin set up the Cluster receiver with the port of the sending qmgr and not the port of the receiving qmgr.

So if the sending and receiving qmgrs are Full Repositories the channel definitions (cluster sender channel) as in defined channels would be fine.
However the auto-defined channel that the cluster is trying to use has the wrong port and as such any connection to the target qmgr will ultimately fail...

Now this would be a minor problem in the cluster.

Worst case scenario:
The well meaning MQ Admin created another qmgr and had it join the cluster by running the MS03 script from an existing qmgr. He did not change (prior to loading) the cluster receiver channel name but may have changed its definition ( port)...
Thus you have 2 qmgrs with the same cluster receiver channel name in the cluster.
When a qmgr tries to send a message to either of the recipient it gets confused as to what definition to use...

This would be a major problem in the cluster...
_________________
MQ & Broker admin

jefflowrey

Posted: Wed Jan 09, 2008 4:57 am Post subject:

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

Remember that this is a problem with creating or starting cluster sender channels to every other queue manager in the cluster.

Quote:

channels *from* one of the Queue Managers

_________________
I am *not* the model of the modern major general.

fjb_saper

Posted: Wed Jan 09, 2008 5:08 am Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20771
Location: LI,NY

Good point Jeff.

This brings me to NAT traversal rules and firewall rules.

The "one" qmgr is in a different network and even though the host name gets resolved correctly the ip/port either doesn't get translated properly (NAT) or the port is not open (firewall) for that one destination qmgr...

And the culprit is:
The channel (cluster receiver) has the IP in it. This IP is not valid for the one qmgr in the different network. Use the hostname!
_________________
MQ & Broker admin

Last edited by fjb_saper on Wed Jan 09, 2008 5:14 am; edited 1 time in total

jefflowrey

Posted: Wed Jan 09, 2008 5:11 am Post subject:

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

Remember that network connectivity has been eliminated as a potential source of the problem.
_________________
I am *not* the model of the modern major general.

fjb_saper

Posted: Wed Jan 09, 2008 5:22 am Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20771
Location: LI,NY

Network connectivity is fine.
The IP however does not point to the same host.
The IP for the qmgr in the other network is subject to NAT.

So all the other qmgrs in the SAME network are fine and communicate.

The host with the IP of the target manager may exist in the other network and may even have a program listening on the right port, or it may not exist at all.

But the IP in the other network needs to be different to reach the right qmgr.

You HAVE to use the hostname and DNS resolution to communicate correctly across networks. You CANNOT use the IP number. This is what the well meaning admin did when he defined the cluster receiver... and it has nothing to do with network connectivity...
_________________
MQ & Broker admin

fjb_saper

Posted: Sun Jan 20, 2008 2:24 pm Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20771
Location: LI,NY

Quote:

There is no firewall involved. Just a few hours ago this cluster was just fine. Something changed in the MQ environment to cause this.

It would much easier to diagnose if it would be Unix....
I'll go out on a limb and state that the listener is running under a user that has no authority / access to the qmgr... or that the cluster receiver mca has a user without authority...

Extreme case: the username looks right but the userid is different. (User was deleted and recreated).
_________________
MQ & Broker admin

sunny_30

Posted: Sat Jan 26, 2008 7:09 pm Post subject:

Master

Joined: 03 Oct 2005
Posts: 258

The question says that the Destination-server/IP that hosts the receiving Qmgr is '123.45.65.789', the Rcvr-qmgr listener is running on port 1414 & the channels carry the right host-names, but doesnt say if the port#s are also verified on the cluster channels.

My guess is that the MQ-admin has changed the CLUSSDR definition on the sending Qmgr from port 1414 to 1415.

If the port# on the CLUSSDR channel of the sending-qmgr is reverted back to 1414, then the channel shd come out of the 'retrying' mode.

PeterPotkay

Posted: Fri Feb 01, 2008 3:35 pm Post subject:

Poobah

Joined: 15 May 2001
Posts: 7723

Turns out iceage had the answer we were looking for. Poking around MQ Explorer in MQ 5.3 the MQ Admin saw a QM property called port # on the TCP/IP tab. "Ah," he said to himself, "this QM is listening on port 1415, so I should set this to 1415 as well. Now that that's done, its Miller time!"

A few hours later as the channels start going INACTIVE and then restarting due to new message traffic they all started bombing out. None of the cluster channel definitions had a port # appended to the hostname in the CONNAME field because all were aiming at QMs that were using the default port # of 1414.

(For those that suggested that the CLUSSNDR had its channel def changed you forget that the manually defined CLUSSNDR channel is only ever used once when the QM initially introduces itself to the cluster, or if you issue REFRESH CLUSTER REPOS(YES). Once a QM is running in a cluster, you could delete its manual CLUSSNDR channel and it would keep working just fine. You shouldn't do that by the way because sooner or later you might need to issue REFRESH CLUSTER REPOS(YES).)

Lesson learned: Always include the port # in your channel definitions, even if its 1414.

Still wondering: Why did IBM build this Port# parm? It's not that big a deal to append a port # on those relatively infrequent times that you define a new channel for a QM.
_________________
Peter Potkay
Keep Calm and MQ On

Mehrdad

Posted: Sat Feb 02, 2008 12:47 am Post subject:

Master

Joined: 27 Feb 2004
Posts: 219
Location: Europe

iceage wrote:

iceage: you are the declared winner. please email your mailing address details to the admin id and the winning Mug will be sent to you.

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » Challenge Forum » Challenge Question - 01 / 2008

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP