ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » General IBM MQ Support » Delay connecting when using Host Names in AMQCLCHL.TAB

Post new topic  Reply to topic
 Delay connecting when using Host Names in AMQCLCHL.TAB « View previous topic :: View next topic » 
Author Message
splaaat
PostPosted: Thu Nov 19, 2009 4:10 pm    Post subject: Delay connecting when using Host Names in AMQCLCHL.TAB Reply with quote

Novice

Joined: 28 Apr 2007
Posts: 10
Location: UK

Hi Folks,

This concerns MQ Client 6.0.2.5 (transactional client libraries are loaded but I dont think they're used) and MQ Server 6.0.2.5. Both on Windows 2003 RC2 SP2

I'm fairly new to MQ client although I have some experience of MQ Server.

The company I work for has recently implemented an app that uses MQ client to connect to our QMs across the network and I'm at the edge of my knowledge and experience when dealing with it. Hopefully this august band of fellows will be able to shed some light on a problem we are experiencing.

At low throughputs, there is quite a large lag between the app making a client call to put a message to a QM and the message actually being sent and arriving on the queue. After looking at timimgs from the app and the QM it was decided to trace one of these client calls and multiple occurances of the following 'interesting' entries were found:

00000481 17:32:57.697331 4604.8 ----------{ cciTcpResolveHostname
00000482 17:32:57.697334 4604.8 Hostname: 'host1.mq.addr'
00000483 17:32:57.697343 4604.8 getaddrinfo(): AF_INET6 & AI_NUMERICHOST: rc=11001 errno=0
00000484 17:32:57.697348 4604.8 getaddrinfo(): AF_INET & AI_NUMERICHOST: rc=11001 errno=0
00000491 17:33:02.693084 4604.8 getaddrinfo(): AF_INET6: rc=11001 errno=0
00000492 17:33:02.693202 4604.8 serv_addr4
00000493 17:33:02.693208 4604.8 Data:-
00000493 17:33:02.693208 4604.8 0x0085CF64 02 00 11 63 0A E0 99 26 00 00 00 00 00 00 00 00 : ...c.à.&........
00000494 17:33:02.693213 4604.8 serv_addr6
00000495 17:33:02.693216 4604.8 Data:-
00000495 17:33:02.693216 4604.8 0x0085CFCC 00 00 11 63 00 00 00 00 00 00 00 00 00 00 00 00 : ...c............
00000495 17:33:02.693216 4604.8 0x0085CFDC 00 00 00 00 00 00 00 00 00 00 00 00 : ............
00000496 17:33:02.693222 4604.8 ----------} cciTcpResolveHostname (rc=OK)


00000497 17:33:02.693232 4604.8 ----------{ cciTcpResolveHostname
00000498 17:33:02.693235 4604.8 Hostname: 'local.mq.addr'
00000499 17:33:02.693242 4604.8 getaddrinfo(): AF_INET6 & AI_NUMERICHOST: rc=11001 errno=0
0000049A 17:33:02.693250 4604.8 getaddrinfo(): AF_INET & AI_NUMERICHOST: rc=11001 errno=0
000004A3 17:33:07.692912 4604.8 getaddrinfo(): AF_INET6: rc=11001 errno=0
000004A4 17:33:07.693024 4604.8 serv_addr4
000004A5 17:33:07.693036 4604.8 Data:-
000004A5 17:33:07.693036 4604.8 0x0085CF78 02 00 00 00 0A E0 E8 C8 00 00 00 00 00 00 00 00 : .....àèÈ........
000004A6 17:33:07.693044 4604.8 serv_addr6
000004A7 17:33:07.693049 4604.8 Data:-
000004A7 17:33:07.693049 4604.8 0x0085CF98 17 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 : ................
000004A7 17:33:07.693049 4604.8 0x0085CFA8 00 00 00 00 00 00 00 00 00 00 00 00 : ............
000004A8 17:33:07.693062 4604.8 ----------} cciTcpResolveHostname (rc=OK)

I think they are interesting for a couple of reasons. If you look at the timings between lines starting with 00000484 and 00000491 there is a 5 second 'gap' with nothing going on. Likewise between the lines starting 0000049A and 000004A3.

All the other actions captured by the trace take milliseconds to complete.

10 seconds is more or less the exact difference between the application saying it has made the put and the QM saying it put the message on the queue.


I have several questions I hope someone may be able to help with:

We use host names in the conn names of the channel definition files and entries in hosts files on the clients that resolve those host names. In addition, host names are used for the LOCALADDR or IPADDR entries for Channels and Listeners respectively to restrict the interfaces they use.

This is done to prevent us having to re-generate the channel tab files when we move the application through different staging environments or alter the QManagers the clients will connect to for testing purposes. A few changes to the hosts files on the clients and QM boxes and we are all set. Is this practice normal, abnormal or downright wrong?

In the trace above, the 5 second gap appears to happen during the lookup of the host names. I wouldn't expect MQ(or the OS) to take 5 seconds to find the entry in the hosts file, especially since the contents of the hosts file is loaded into memory and held ready. Are there any settings on the client I can 'tweek' that will make this process go faster?

Again in the trace, the left hand column is designated as the 'counter' column. The time gap coincides with a break in the numbering of the counter values. Does that mean nothing is actually going on or that something is going on but not being recorded in the trace?

Even though this is a small part of the trace, can anyone familiar with MQTrace notice anything wrong with the name resolution? It looks ok to me, but I'm not an expert with trace files.

Any help would be greatly appreciated. If I need to provide more detail please let me know.

Thanks in anticipation
Back to top
View user's profile Send private message
bruce2359
PostPosted: Thu Nov 19, 2009 4:55 pm    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9469
Location: US: west coast, almost. Otherwise, enroute.

Quote:
At low throughputs...

Are you saying that this slow-down only occurs during times of low transaction volume?

How many CLNTCONN entries are in your client channel table?

Anything interesting or relevant in the MQ error logs on BOTH client and server?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
splaaat
PostPosted: Thu Nov 19, 2009 5:12 pm    Post subject: Reply with quote

Novice

Joined: 28 Apr 2007
Posts: 10
Location: UK

Hi Bruce,

Wow that was quick

This only came to light recently during some specific testing scenarios that had us injecting a single message at intervals varying from 4 seconds and 4 minutes. Previously, testing had been done at rates from 5 messages/sec to 20 messages/sec and this phenomenon was not seen.

We have several sets of TABs that are specific to an account/function combination (we have particular security and segregation requirements), but the maximum number of connections in a tab is 8 and the minimum 3.

I didn't check them myself, but I was told there was nothing of any note in the MQ error logs.

I'm not sure if it's relevant but we use SSL on the connections, there didn't seem to be any major hiccups in the trace during handshaking.

I hope that gives you some more insight.
Back to top
View user's profile Send private message
bruce2359
PostPosted: Thu Nov 19, 2009 5:26 pm    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9469
Location: US: west coast, almost. Otherwise, enroute.

Another question or two for you:

Is this slow-down new behavior? If so, what has changed?

Is this behavior confined to one client? One qmgr?

Are all of the CLNTCONN definitions in the table pointing to active qmgrs (SVRCONN channels, listeners)?

You mentioned SSL. Is SSL the new item for this app?

Quote:
Wow that was quick.

I was passing by my home office, and noticed a new email.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
bruce2359
PostPosted: Thu Nov 19, 2009 5:28 pm    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9469
Location: US: west coast, almost. Otherwise, enroute.

Good gawd, it's 1:30am in the UK! What the heck are you doing up at this hour?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
splaaat
PostPosted: Thu Nov 19, 2009 5:53 pm    Post subject: Reply with quote

Novice

Joined: 28 Apr 2007
Posts: 10
Location: UK

Hi Bruce,

Passing again huh?

The behaviour is newly observed, I personally don't believe it is newly occurring. I think we have just been testing at profiles that haven't shown the behaviour up 'til now. We'll need to re-run some earlier tests to check though. Unfortunately that's not possible at the moment as the testers are on a tight schedule

Nothing in the actual MQ part of the environment has changed. However, these are proving runs, so the application is getting tweeked to test the various scenarios. All the settings that might affect this behaviour have been checked (batch sizing, polling intervals, transactionality etc) and nothing seems amiss.

We only traced one client, but the App runs any number between 2 and 6. In order to narrow down the trace we restricted the app to a single message type on a single host (client). The long response times were previously noticed within the app's reporting database when 2 clients were running. So until we can retest we have to assume it is with any number of clients running.

The CLNTCONN definitions point to 2 active QMs, the app biases its connections towards 1 QM and fails across to another definition in the TAB if it can't get a primary connection. The app has primary and secondary transport settings defined and the corresponding entries are in the TAB file.

We did preliminary tests without SSL in place, but SSL has been in the environment for a while now, so it's not really 'new'.

Thanks for the reply, I need to get some shut eye now. I'll answer any additional questions in the morning.

Keep passing and thinking
Back to top
View user's profile Send private message
splaaat
PostPosted: Thu Nov 19, 2009 6:00 pm    Post subject: Reply with quote

Novice

Joined: 28 Apr 2007
Posts: 10
Location: UK

Quote:
Good gawd, it's 1:30am in the UK! What the heck are you doing up at this hour?


lol I have a live implimentation over the weekend and am trying to shift my body clock so I dont fall asleep in the middle of the night.

And I'm dedicated of course

I gotta go sleep
Back to top
View user's profile Send private message
bruce2359
PostPosted: Thu Nov 19, 2009 6:09 pm    Post subject: Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9469
Location: US: west coast, almost. Otherwise, enroute.

Quote:
The CLNTCONN definitions point to 2 active QMs...

But how many total CLNTCONN entries are in the table in question? Just 2? Or more, and only 2 of x are active?

(Keep in mind I haven't looked at/into your trace at all...)

I'm pondering if it's the table...

The client software tries to MQCONNect (CLNTCONN entries in alphabetical order, IMS), and takes its bloody time trying each CLNTCONN until each one fails, OR one connects successfully. This can take 5-10 seconds locally, even with a small table.

There is a post on the Hursley site that gave some additional details...

Happy dreams.
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
PeterPotkay
PostPosted: Thu Nov 19, 2009 6:48 pm    Post subject: Re: Delay connecting when using Host Names in AMQCLCHL.TAB Reply with quote

Poobah

Joined: 15 May 2001
Posts: 7722

splaaat wrote:
We use host names in the conn names of the channel definition files and entries in hosts files on the clients that resolve those host names. In addition, host names are used for the LOCALADDR or IPADDR entries for Channels and Listeners respectively to restrict the interfaces they use.

This is done to prevent us having to re-generate the channel tab files when we move the application through different staging environments or alter the QManagers the clients will connect to for testing purposes. A few changes to the hosts files on the clients and QM boxes and we are all set. Is this practice normal, abnormal or downright wrong?

I would say this its abnormal. It certainly adds a layer of complexity. And if it turns out to be the source of the problem, then you could say its down right wrong.

The trace shows the delay is in the lookup of the hostname. Without anything better to go on, I would try the folloiwng tests and study the results to see if they give a clue.

Use the amqscnxc test program to see if it shows the same delay. Run it with the -x and -c flags and again without, to see if using the channel tables or not makes a difference.

Does ping show the same delay?

Does trying to open a telnet session from the client to the MQ server on that port show a delay?

Does turning SSL off on the channel make a difference for this problem?

For all the above tests, does it work any better if you specify the IP address rather than the hokey host name translation?


Personally I would advise you to use channel tables with the correct fully qualified hostnames. You say you do this to avoid having to regenerate the channel tabel files. I would think you create those files once and you are done, other than when you need to add some new entries for new channels / QMs. Somethings gotta change and I would rather it be the Channel Table versus the host names file. The MO72 Support Pack makes it very easy to manage Channel Tables.
_________________
Peter Potkay
Keep Calm and MQ On
Back to top
View user's profile Send private message
splaaat
PostPosted: Fri Nov 20, 2009 2:14 am    Post subject: Reply with quote

Novice

Joined: 28 Apr 2007
Posts: 10
Location: UK

Hi Guys,

Thanks for the great questions and suggestions, they're just what I needed to get the grey matter stirred.

Bruce

Quote:
But how many total CLNTCONN entries are in the table in question? Just 2? Or more, and only 2 of x are active?


The TAB in question has 4 entries 2 channels to 2 QManagers.

Peter

Quote:
Personally I would advise you to use channel tables with the correct fully qualified hostnames


I think you've hit the nail on the head there. Additional tracing at the OS level has shown 3 DNS lookup attempts (we use 3 suffixes on the IP config of the NIC) using the hosts values from the TAB with the box's DNS suffix appended before the hosts file is checked. Quite why it is happening in that order is a mystery as the hosts file should be in DNS resolver cache in memory as it's loaded up as the OS starts.

We are going to amend the Hosts file to include the suffixs on the host values and see if the system picks up the names right away.

Quote:
I would say this its abnormal. It certainly adds a layer of complexity. And if it turns out to be the source of the problem, then you could say its down right wrong.


I agree totally that this adds complexity, but it is done out of the need to add flexibilty to our environment.

For pre-live and test we have a set of client machines that connect to 2 distinct processing areas. Those processing areas have multiple setups (different hosts, test tools, reply stubs and actual processing routines) which means we would need to have 12 sets of channel tabs to cover alll the different possible combinations. Add to this the fact we don't just have a single table, we have a set of tables that will only allow connections for a client running under a specific security context and contacting a specifc QM/queue combination, we actually have 24 TAB files making up a set. Now, if I told you we have 5 distinct testing systems which all need TABs and 'several' individual development machines, you'd probably faint (I do regularly).

So managing all the TABs for all the possible testing scenarios is (actually was) a nightmare. Hence the use of the hosts entries to deal with the different end processing areas. Now all we have to do is change hosts entries and restart/reconnect. This also has benefits for progression of the aplication to live as the TABS don't change as you progress through unit, system and performance environments finally to live, you have kept the same TABs all the way up and just changed hosts entries.

I can hear you thinking... why is this so complicated! Unfortunately we are driven by external influences that require a certain level of testing under certain conditions to ensure the systems operate to prescribed SLAs. Add to that the fact some processing areas are shared and not always available so you need to shift to alternatives and the support/maintenance of testing is challenging. I don't think we'd be able to do it without the use of tokenisation and the hosts files.

So, getting this right is fairly crucial for us and it's great to get the help of you guys to point us in the right direction.

I'll try the entry tweek to hosts and if that doesn't work roll over the recommendations made by Peter to get a better picture of what is going on.


Thanks for the help guys, I'll let you know how it goes.
Back to top
View user's profile Send private message
bruce2359
PostPosted: Fri Nov 20, 2009 5:20 pm    Post subject: Re: Delay connecting when using Host Names in AMQCLCHL.TAB Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9469
Location: US: west coast, almost. Otherwise, enroute.

PeterPotkay wrote:

Use the amqscnxc test program to see if it shows the same delay. Run it with the -x and -c flags and again without, to see if using the channel tables or not makes a difference.

Does ping show the same delay?

Does trying to open a telnet session from the client to the MQ server on that port show a delay?

Does turning SSL off on the channel make a difference for this problem?

For all the above tests, does it work any better if you specify the IP address rather than the hokey host name translation?

Out of curiosity, did you do these tests to get a benchmark? What were the results?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
splaaat
PostPosted: Sat Nov 21, 2009 3:26 am    Post subject: Reply with quote

Novice

Joined: 28 Apr 2007
Posts: 10
Location: UK

Hi Bruce,

I won't be in a position to get 'hands on' until Tuesday now. I'll do the tests and post the outcome as soon as I can.
Back to top
View user's profile Send private message
splaaat
PostPosted: Wed Nov 25, 2009 12:28 pm    Post subject: Reply with quote

Novice

Joined: 28 Apr 2007
Posts: 10
Location: UK

Hi Bruce,

I managed to get back to this problem yesterday and the solution is very simple. Peter was on the right track with the fully qualified hosts names, but we went in a slightly different direction to fix the issue.

For these servers all MQ traffic goes over an isolated VLAN. The delay was being caused by DNS lookups to our domain for the host entries in the channel tabs because the domain suffixes had been appended to the host names.

On the NICs that were configured in the MQ VLAN, we removed the 'append Domain name suffixes' setting in the advanced section of TCP/IP configuration and the delays dissapeared. Client connection is now instantanious.

This issue was not spotted before because the testing scenarios used blocks of high volume constant message flow through the application and MQ. The application has a 'warm up' period and this was masking the delay in the first host name look-ups from the channel tabs. When we moved to low volume sporadic message flow, the DNS look-up issue was highlighted.

Thanks very much for all the help on this matter, I'm sure it would have taken us longer to find if we didn't have help from this forum.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » General IBM MQ Support » Delay connecting when using Host Names in AMQCLCHL.TAB
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.