|
RSS Feed - WebSphere MQ Support
|
RSS Feed - Message Broker Support
|
1300 character Message not transmitted |
« View previous topic :: View next topic » |
Author |
Message
|
HubertKleinmanns |
Posted: Thu Oct 17, 2019 12:57 am Post subject: |
|
|
Shaman
Joined: 24 Feb 2004 Posts: 732 Location: Germany
|
flsb wrote: |
1 thing I noticed from these 2 parties are that they do NAT on their network to the MQ server. |
MQ doesn't care about NATting. It just needs a (mostly) stable network connection.
flsb wrote: |
Receiving end should be the default definition settings |
Again: Could you please verify this?
I come back to what Morag wrote: Did you have a look at the firewalls?
In addition: Are there any load balancers between? Did you try an IP trace?
There is no reason in IBM MQ, why a change from AIX MQv5.3 (physical) to RHEL MQv9.1(vm) sould prevent transferring messages of 1.3k. I worked with both and had much larger messages to transfer.
So there must be something wrong in the network. This could be:
- local IP stack (AIX vs. RHEL)
- local network cards (e. g. full-duplex vs. half-duplex, packet size, ...)
- router (routing rules, packet size, network settings, ...)
- firewalls (firewall rules, packet size, packet frequency, network settings, ...)
- load balancer (balancing rules, packet size, network settings, ...)
In the past I had a situation, that a flag on the netwok card prevented MQ channels to be started. This was on the same operating system, but other hardware (I don't remember exactly what went wrong, but I guess the even wouldn't help you).
So you should ask OS/network/firewall administrators in your local network and on the parties side, to assit you. _________________ Regards
Hubert |
|
Back to top |
|
|
HubertKleinmanns |
Posted: Thu Oct 17, 2019 1:00 am Post subject: |
|
|
Shaman
Joined: 24 Feb 2004 Posts: 732 Location: Germany
|
flsb wrote: |
as i've mentioned earlier..
the firewall only change on the object
example,
object MQserver from IP 192.168.0.1 (AIX MQ) to 192.168.0.2 (new RHEL MQ)
all firewall rules remain intact.. |
Firewall rules are based on IP addresses. So even if only the address has changed, you could have totally different firewall rules. Don't trust the firewall administrators . _________________ Regards
Hubert |
|
Back to top |
|
|
fjb_saper |
Posted: Thu Oct 17, 2019 7:59 am Post subject: |
|
|
Grand High Poobah
Joined: 18 Nov 2003 Posts: 20700 Location: LI,NY
|
Also remember that the RHEL box has its own firewall
See firewall-cmd
You need to make sure the MQ ports are open on the RHEL firewall in the correct zone... _________________ MQ & Broker admin |
|
Back to top |
|
|
gbaddeley |
Posted: Thu Oct 17, 2019 2:25 pm Post subject: |
|
|
Jedi Knight
Joined: 25 Mar 2003 Posts: 2501 Location: Melbourne, Australia
|
flsb wrote: |
i know it looks like it is pointing to a network problem instead of an MQ problem but a few factors are contradicting with one another.. |
Don't waste time on band-aid work-arounds. You haven't actually found root cause yet, and the problem is unlikely to go away.
Quote: |
3) Wireshark and firewall logs show packets sent through
The only changes from this is upgrading from AIX MQ5.3 (physical server) to RHEL MQ 9.1 (VM)
And this happened before in one of my other MQ server version 7.5 as well and that we have to route the channel to AIXMQ5.3 to go through. |
A more in-depth analysis of the packet / firewall logs is required. Compare normal operation of the TCP session when the MQ channel is running, against what is observed when the select() timeout error occurs. We have had similar issues and it took quite a while to reveal the cause in the network (under load, network switches were sometimes dropping ACK packets for particular MAC address prefixes). A possible scenario is that a firewall is silently killing a socket session if there are no packets sent during a certain time period. You could try reducing the HBINT or KAINT (if supported on your platform) on the sender channels to a low value (eg. 10 - 60 secs). _________________ Glenn |
|
Back to top |
|
|
flsb |
Posted: Thu Oct 17, 2019 5:57 pm Post subject: |
|
|
Apprentice
Joined: 01 Apr 2010 Posts: 42
|
gbaddeley wrote: |
A more in-depth analysis of the packet / firewall logs is required. Compare normal operation of the TCP session when the MQ channel is running, against what is observed when the select() timeout error occurs. We have had similar issues and it took quite a while to reveal the cause in the network (under load, network switches were sometimes dropping ACK packets for particular MAC address prefixes). A possible scenario is that a firewall is silently killing a socket session if there are no packets sent during a certain time period. You could try reducing the HBINT or KAINT (if supported on your platform) on the sender channels to a low value (eg. 10 - 60 secs). |
i have tried KAINT(60) HBINT(30) on both side channels but still failed..
and my side is an RHEL and the other parties are using Windows
the problem is other channels with different messages are sending through..
no load balancers between us.
from firewall side, they have trace the packets sent on the rule and packets are sent over..
at the other side, wireshark installed and monitored packet received..
somehow the whole message cannot fully commit to the receiver.. |
|
Back to top |
|
|
bruce2359 |
Posted: Thu Oct 17, 2019 8:05 pm Post subject: |
|
|
Poobah
Joined: 05 Jan 2008 Posts: 9406 Location: US: west coast, almost. Otherwise, enroute.
|
flsb wrote: |
...and my side is an RHEL and the other parties are using Windows
the problem is other channels with different messages are sending through... |
Please be precise. The same qmgr in this same RHEL instance has MQ channels to the same Windows instance, AND only this one channel is experiencing this issue? _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
|
HubertKleinmanns |
Posted: Thu Oct 17, 2019 10:36 pm Post subject: |
|
|
Shaman
Joined: 24 Feb 2004 Posts: 732 Location: Germany
|
fjb_saper wrote: |
Also remember that the RHEL box has its own firewall
See firewall-cmd
You need to make sure the MQ ports are open on the RHEL firewall in the correct zone... |
This doesn't explain, why splitted messages pass the channel properly . _________________ Regards
Hubert |
|
Back to top |
|
|
HubertKleinmanns |
Posted: Thu Oct 17, 2019 10:42 pm Post subject: |
|
|
Shaman
Joined: 24 Feb 2004 Posts: 732 Location: Germany
|
bruce2359 wrote: |
flsb wrote: |
...and my side is an RHEL and the other parties are using Windows
the problem is other channels with different messages are sending through... |
Please be precise. The same qmgr in this same RHEL instance has MQ channels to the same Windows instance, AND only this one channel is experiencing this issue? |
I assume, flsb used the same channel :
flsb wrote: |
We split the 1.3k characters message to smaller chunks and is able to send through but with altering the channel batchsz to 1. |
_________________ Regards
Hubert |
|
Back to top |
|
|
HubertKleinmanns |
Posted: Thu Oct 17, 2019 10:55 pm Post subject: |
|
|
Shaman
Joined: 24 Feb 2004 Posts: 732 Location: Germany
|
flsb wrote: |
i have tried KAINT(60) HBINT(30) on both side channels but still failed..
and my side is an RHEL and the other parties are using Windows
the problem is other channels with different messages are sending through..
no load balancers between us.
from firewall side, they have trace the packets sent on the rule and packets are sent over..
at the other side, wireshark installed and monitored packet received..
somehow the whole message cannot fully commit to the receiver.. |
I know, it's a hard job, but you have to organize a conference call with these people:
- MQ admins on RHEL and Windows (to look at MQ and try some message transfers).
- RHEL admins on your side (tracing the traffic on RHEL with tcpdump).
- Windows admins on the parties side (tracing the traffic on Windows with wireshark).
- Network admins on your side and on parties side (looking at the routers and checking the traffic).
- Firewall admins your side and on parties side (looking at the packages, which are passed/dropped and checking the NATting).
All of these have to look at the same time on their systems. Keep at it until the problem has been identified.
And fill a big pot with coffee before, because such tests could take a while . _________________ Regards
Hubert |
|
Back to top |
|
|
flsb |
Posted: Thu Oct 17, 2019 11:17 pm Post subject: |
|
|
Apprentice
Joined: 01 Apr 2010 Posts: 42
|
bruce2359 wrote: |
flsb wrote: |
...and my side is an RHEL and the other parties are using Windows
the problem is other channels with different messages are sending through... |
Please be precise. The same qmgr in this same RHEL instance has MQ channels to the same Windows instance, AND only this one channel is experiencing this issue? |
yes, that's what i've been saying.
other channels with shorter message is transmitting.
and this channel is only transmitting after we split the 1.3 characters message into smaller chunks and batchsz to 1 |
|
Back to top |
|
|
flsb |
Posted: Thu Oct 17, 2019 11:22 pm Post subject: |
|
|
Apprentice
Joined: 01 Apr 2010 Posts: 42
|
HubertKleinmanns wrote: |
I know, it's a hard job, but you have to organize a conference call with these people:
- MQ admins on RHEL and Windows (to look at MQ and try some message transfers).
- RHEL admins on your side (tracing the traffic on RHEL with tcpdump).
- Windows admins on the parties side (tracing the traffic on Windows with wireshark).
- Network admins on your side and on parties side (looking at the routers and checking the traffic).
- Firewall admins your side and on parties side (looking at the packages, which are passed/dropped and checking the NATting).
All of these have to look at the same time on their systems. Keep at it until the problem has been identified.
And fill a big pot with coffee before, because such tests could take a while . |
wow.. i don't think i can get all these ppl together..
maybe from my side yes, but not the other parties..
they will just be asking for an RCA as currently it is transmitting via splitting message..
another party is asking us to rollback to AIXv5.3 as the migration is from our side and we are affecting them..
while 20+ other parties are working fine after the version upgrade but this party is asking for a rollback since it is not working for them..
*faint* |
|
Back to top |
|
|
hughson |
Posted: Fri Oct 18, 2019 12:25 am Post subject: |
|
|
Padawan
Joined: 09 May 2013 Posts: 1916 Location: Bay of Plenty, New Zealand
|
Do you have a PMR open with IBM for this problem? If not, perhaps you should at this point. _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
|
HubertKleinmanns |
Posted: Fri Oct 18, 2019 2:56 am Post subject: |
|
|
Shaman
Joined: 24 Feb 2004 Posts: 732 Location: Germany
|
flsb wrote: |
HubertKleinmanns wrote: |
I know, it's a hard job, but you have to organize a conference call with these people:
- MQ admins on RHEL and Windows (to look at MQ and try some message transfers).
- RHEL admins on your side (tracing the traffic on RHEL with tcpdump).
- Windows admins on the parties side (tracing the traffic on Windows with wireshark).
- Network admins on your side and on parties side (looking at the routers and checking the traffic).
- Firewall admins your side and on parties side (looking at the packages, which are passed/dropped and checking the NATting).
All of these have to look at the same time on their systems. Keep at it until the problem has been identified.
And fill a big pot with coffee before, because such tests could take a while . |
wow.. i don't think i can get all these ppl together..
maybe from my side yes, but not the other parties..
they will just be asking for an RCA as currently it is transmitting via splitting message..
another party is asking us to rollback to AIXv5.3 as the migration is from our side and we are affecting them..
while 20+ other parties are working fine after the version upgrade but this party is asking for a rollback since it is not working for them..
*faint* |
I said, this is a hard job .
May be the first step could be, to get the people on your side together - and hopefully you will find the issue. But the best way is indeed, to get all the people on board. And this at least would be the next step.
Rolling back to MQv5.3 cannot be the solution, because this version is out of support for more than 10 years .
I agree to Morag, that you should open a PMR at IBM. But I fear, they will ask the same questions . _________________ Regards
Hubert |
|
Back to top |
|
|
bruce2359 |
Posted: Fri Oct 18, 2019 4:26 am Post subject: Re: 1300 character Message not transmitted |
|
|
Poobah
Joined: 05 Jan 2008 Posts: 9406 Location: US: west coast, almost. Otherwise, enroute.
|
flsb wrote: |
We split the 1.3k characters message to smaller chunks and is able to send through but with altering the channel batchsz to 1. |
How exactly (what utility, where) did you split the 1.3k messages into smaller chunks? _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
|
bruce2359 |
Posted: Fri Oct 18, 2019 4:29 am Post subject: |
|
|
Poobah
Joined: 05 Jan 2008 Posts: 9406 Location: US: west coast, almost. Otherwise, enroute.
|
HubertKleinmanns wrote: |
Rolling back to MQv5.3 cannot be the solution, because this version is out of support for more than 10 years . |
I disagree. Rolling back to v5.3 is a temporary solution while you, your partner team, and IBM work to resolve this. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
|
|
|
|
You cannot post new topics in this forum You cannot reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
|
|
|