ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » IBM MQ API Support » Broken connection : recovering faster

Post new topic  Reply to topic
 Broken connection : recovering faster « View previous topic :: View next topic » 
Author Message
jga
PostPosted: Fri Oct 20, 2017 2:25 am    Post subject: Broken connection : recovering faster Reply with quote

Newbie

Joined: 20 Oct 2017
Posts: 4

NB: I get an error message "You must have 1 posts before you can post URL's/Links" so I am citing other threads from this forum only by their topic id.

Hi,

We have been running MQ series clients in production for several years, various versions, on windows and solaris mainly. We just migrated a solaris install to a linux machine and all hell broke loose with broken connections (2009) all the time when doing an MQPUT after "some inactivity" in the client.

All the code is ANSI-C. We make an MQPUT call and wait for the answer.

The real problem we face is that it takes SIX MINUTES for MQ to answer (2009) connection broken. In the meantime, our internal watchdogs on other processes have long flagged the data as being invalid, killed a few processes automatically and restarted them.

I have read topic 11375 and I shall ask the admins to have a look at HBINT but this will take some time to be done.

Is there a way we could programmatically (in our C code) set a channel ou queue property or whatever to change the heartbeat interval ? I did not find any. Or the KAINT.

I have read topic 71845 and I agree that as a general rule, it is stupid to spend more time checking that we can work instead of doing the job.

In my case, if using MQINQ answered me immediately "hey you lost connection", I would be perfectly happy to add a config option to my application to "ping before sending". And I would be perfectly happy to have twice in a blue moon a "ping ok" followed by an error when actually writing.

Actually, I do not care if I get an error from MQ, I know how to handle that. My problem is that I **really** need to get it "in a reasonable time". SIX MINUTES is NOT a reasonable time in our time scale. Thirty seconds might be ok, one minute is a maximum.

We had the exact same problem on various other occasions with MQPUT, but using WaitInterval in MQGMO did exactly what we needed, whereas TimeOut param in MQPMO is described as "reserved" and "ignored".

Most probably the root cause is a badly configured firewall / network device stupidly shutting down the connection without sending the proper SYN/ACK but there is no way I shall be able to get this fixed (unfortunately).

Any help appreciated.
Sincerely,
John
Back to top
View user's profile Send private message
bruce2359
PostPosted: Fri Oct 20, 2017 5:06 am    Post subject: Re: Broken connection : recovering faster Reply with quote

Poobah

Joined: 05 Jan 2008
Posts: 9392
Location: US: west coast, almost. Otherwise, enroute.

jga wrote:
NB: I get an error message "You must have 1 posts before you can post URL's/Links" so I am citing other threads from this forum only by their topic id.

This requirement helps keep robots from posting scams.

jga wrote:

We had the exact same problem on various other occasions with MQPUT, but using WaitInterval in MQGMO did exactly what we needed, whereas TimeOut param in MQPMO is described as "reserved" and "ignored"

What exactly did you need MQGMO WaitInterval for?
_________________
I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live.
Back to top
View user's profile Send private message
jga
PostPosted: Fri Oct 20, 2017 6:24 am    Post subject: Re: Broken connection : recovering faster Reply with quote

Newbie

Joined: 20 Oct 2017
Posts: 4

bruce2359 wrote:
What exactly did you need MQGMO WaitInterval for?


Control the maximum time we allow MQ Series to tell us if there is something to read or not.

In general, we use zero so that we have an "immediate" return (either with the message itself or "nothing to read"), sometimes we allow for one or two seconds. It also depends if we are using correlation Ids or if we read anything available on the queue. We definitely never use an infinite wait for something to be available to read.

So far we have been happy with it because the single thing we can NOT afford in this application is blocking calls waiting for a ressource to answer "for too long", we have monothread programs. If we set WaitInterval to 2 seconds and it actually takes 3 seconds, we honestly do not care, we just need a "reasonable" time-out. Actually, we need the real definition of "real-time" :
1) an answer is garanteed 2) it is garanteed in so many seconds.

We do not really care if it takes up to 30 seconds, anything above might become a problem. We send/receive messages around 2 to 3 thousand chars, so on a production system, these do not seem crazy expectations to me on a normal day (I can live with a temporary slowdown twice a year).

Which does not mean this parameter would always work in the case I am trying to solve, may be it would be ignored and also take six minutes to actually give back an answer, but I far as I know we have only seen (2009) errors on the writing processes, not on the reading processes so I can not say.

I guess we poll the queues we are reading often enough to have some activity, whereas the write queues are used in "burst" mode (i.e. we write a lot, then it can be a few minutes to a few hours before writing again), but this might be wrong.

John
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Fri Oct 20, 2017 7:29 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20695
Location: LI,NY

You might want to have a look into TCP/IP Keep alive.
The wait time there is set at the machine level. But it could be set to less than the six minutes it seems to be at right now... Mind you default wait time is 2 hours...
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
tczielke
PostPosted: Fri Oct 20, 2017 9:11 am    Post subject: Re: Broken connection : recovering faster Reply with quote

Guardian

Joined: 08 Jul 2010
Posts: 939
Location: Illinois, USA

jga wrote:
We just migrated a solaris install to a linux machine and all hell broke loose with broken connections (2009) all the time when doing an MQPUT after "some inactivity" in the client.

All the code is ANSI-C. W


Is the platform change here Solaris SPARC to Linux x86?
_________________
Working with MQ since 2010.
Back to top
View user's profile Send private message
jga
PostPosted: Mon Oct 23, 2017 12:18 am    Post subject: Reply with quote

Newbie

Joined: 20 Oct 2017
Posts: 4

fjb_saper wrote:
You might want to have a look into TCP/IP Keep alive.
The wait time there is set at the machine level. But it could be set to less than the six minutes it seems to be at right now...


Thanks for pointing it out, I definitely asked the admins there to check the HBINT but I forgot to ask also the KAINT, just did.

Do you know if there is any way to "force" these two parameters or at least try to set on programmatically (i.e. in our C program) ?

fjb_saper wrote:
Mind you default wait time is 2 hours...

Excellent, just what we need )
Back to top
View user's profile Send private message
jga
PostPosted: Mon Oct 23, 2017 12:27 am    Post subject: Re: Broken connection : recovering faster Reply with quote

Newbie

Joined: 20 Oct 2017
Posts: 4

tczielke wrote:
Is the platform change here Solaris SPARC to Linux x86?


Yes. I can not vouch for a change in MQSeries client version or not.

As far as I know, we have other linux x86 instances running without a glitch but it does not mean we should not change our code and use specific linux or x86 flags I am not aware of.

The linux machine is definitely not on the same network the sparkie was before either.

Actually the single thing that did not change is our C code (just recompiled), and I think the MQ Manager we are connecting to is the same as before but I can not garantee it.

All this is not information we have easy access to, our clients do not tell us all this but I can always ask if there is anything you think is relevant.
Back to top
View user's profile Send private message
tczielke
PostPosted: Mon Oct 23, 2017 4:49 am    Post subject: Reply with quote

Guardian

Joined: 08 Jul 2010
Posts: 939
Location: Illinois, USA

This might not be the issue, but I did run into an issue with a Solaris SPARC to Linux x86 effort where a C++ program had made some endianness coding assumptions about binary integers (e.g. int, long) that was causing the program to not work under Linux x86. It would be helpful to ask the C programmers if they have reviewed their code and if they took into account the endianness difference between the SPARC processor (big endian) and x86 (little endian). If they have no idea what you are talking about, they should research it and at least understand why the following program works differently when compiled and run on Solaris SPARC and Linux x86.

Code:

#include <stdlib.h>
#include <stdio.h>

int main (void)
{
   union endianness {
      long s;
      char c[sizeof(long)];
   }  un;

   un.s = 0x0001;
   if (un.c[0] == 1)
       printf("lowest address of multi-byte long contains 0x01. Little endian processor\n");
   else
       printf("lowest address of multi-byte long contains 0x00. Big endian processor\n");

}

_________________
Working with MQ since 2010.
Back to top
View user's profile Send private message
zpat
PostPosted: Mon Oct 23, 2017 5:07 am    Post subject: Reply with quote

Jedi Council

Joined: 19 May 2001
Posts: 5849
Location: UK

What is the disconnect interval (DISCINT) on your SVRCONN channel?
_________________
Well, I don't think there is any question about it. It can only be attributable to human error. This sort of thing has cropped up before, and it has always been due to human error.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » IBM MQ API Support » Broken connection : recovering faster
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.