Author |
Message
|
nelson |
Posted: Wed Oct 19, 2016 6:16 am Post subject: Lack of available virtualized memory space alerts |
|
|
 Partisan
Joined: 02 Oct 2012 Posts: 313
|
Hi all,
We are working on MQ7.5.0.4:
A lot of these FDC's were generated before a MQ crash:
Code: |
+-----------------------------------------------------------------------------+
| |
| WebSphere MQ First Failure Symptom Report |
| ========================================= |
| |
| Date/Time :- Fri October 14 2016 14:24:33 AST |
| UTC Time :- 1476469473.952225 |
| UTC Time Offset :- -240 (AST) |
| Host Name :- myhost |
| Operating System :- AIX 7.1 |
| PIDS :- 5724H7221 |
| LVLS :- 7.5.0.4 |
| Product Long Name :- WebSphere MQ for AIX |
| Vendor :- IBM |
| Installation Path :- /usr/mqm |
| Installation Name :- Installation1 (1) |
| License Type :- Production |
| Probe Id :- XC267011 |
| Application Name :- MQM |
| Component :- xehAsySignalMonitor |
| SCCS Info :- /build/slot1/p750_P/src/lib/cs/unix/amqxerrx.c, |
| Line Number :- 2912 |
| Build Date :- Aug 7 2014 |
| Build Level :- p750-004-140807 |
| Build Type :- IKAP - (Production) |
| Effective UserID :- 12 (mqm) |
| Real UserID :- 203 (mqsi) |
| Program Name :- amqzlaa0 |
| Addressing mode :- 64-bit |
| LANG :- en_US |
| Process :- 2097488 |
| Thread :- 2 |
| UserApp :- FALSE |
| Last HQC :- 0.0.0-0 |
| Last HSHMEMB :- 0.0.0-0 |
| Major Errorcode :- xecE_W_UNEXPECTED_ASYNC_SIGNAL |
| Minor Errorcode :- OK |
| Probe Type :- MSGAMQ6209 |
| Probe Severity :- 3 |
| Probe Description :- AMQ6209: An unexpected asynchronous signal (33 : |
| SIGDANGER) has been received and ignored. |
| FDCSequenceNumber :- 0 |
| Arith1 :- 33 (0x21) |
| Arith2 :- 2097488 (0x200150) |
| Comment1 :- SIGDANGER |
| |
+-----------------------------------------------------------------------------+
|
And here is the IBM's tech note related to the issue:
Quote: |
Problem(Abstract)
To alert you to a lack of available virtualized memory space, WebSphere MQ generates many FDC files with the Probe ID XC267011, showing the signal SIGDANGER. |
http://www-01.ibm.com/support/docview.wss?uid=swg21685969
Is there any possibility that the problem was not related to virtual memory space? Something like a high CPU usage for example or other failing resource?
Have any of you faced the issue?
Thanks in advance. |
|
Back to top |
|
 |
mqjeff |
Posted: Wed Oct 19, 2016 6:20 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Have you already eliminated the possibility that it was out of memory? _________________ chmod -R ugo-wx / |
|
Back to top |
|
 |
Vitor |
Posted: Wed Oct 19, 2016 6:21 am Post subject: Re: Lack of available virtualized memory space alerts |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
nelson wrote: |
Is there any possibility that the problem was not related to virtual memory space? Something like a high CPU usage for example or other failing resource? |
So why don't you think the problem is what IBM says it is?  _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
nelson |
Posted: Wed Oct 19, 2016 6:35 am Post subject: |
|
|
 Partisan
Joined: 02 Oct 2012 Posts: 313
|
mqjeff wrote: |
Have you already eliminated the possibility that it was out of memory? |
In fact... but unfortunately we don't have a historical of the virtual memory and other resources consumption. Neither another previous error that indicates a high virtual memory usage. By now, after the crash, all is working fine and have no other option but to keep monitoring the processes...
Other than a memory leak in a IIB flow for example, at MQ level, what another "silent" problem could lead to a high virtual memory usage?
The AIX specialists confirmed us that there is enough VM space, the maximum recommended for the current physical memory.
Thanks in advance for your comments. |
|
Back to top |
|
 |
mqjeff |
Posted: Wed Oct 19, 2016 6:38 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Do you see any indications of any crashes or leaks in a message flow?
Have the AIX team confirmed that the memory of the virtual machine is not shared between other virtual machines? _________________ chmod -R ugo-wx / |
|
Back to top |
|
 |
Vitor |
Posted: Wed Oct 19, 2016 7:08 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
nelson wrote: |
The AIX specialists confirmed us that there is enough VM space, the maximum recommended for the current physical memory. |
Even at the actual time MQ was sending SIGDANGER? _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
nelson |
Posted: Wed Oct 19, 2016 8:57 am Post subject: |
|
|
 Partisan
Joined: 02 Oct 2012 Posts: 313
|
mqjeff wrote: |
Do you see any indications of any crashes or leaks in a message flow?
Have the AIX team confirmed that the memory of the virtual machine is not shared between other virtual machines? |
Six minutes after the alerts an abend/dump was generated:
Code: |
+-----------------------------------------------------------------------------+
| |
| |
| First Failure Symptom Report |
| ======================== |
| |
| Proc start time (GMT) :- Fri Oct 14 14:30:44 2016 |
| |
| Product Details |
| +++++++++++++++ |
| |
| Vendor :- IBM |
| Product Name :- IBM Integration Bus |
| Program ID :- 5724-J05 |
| Version :- 9002 |
| |
| OS Information |
| ++++++++++++++ |
| |
| Operating System :- AIX |
| Version :- 7 |
| Release :- 1 |
| Node Name :- aixcl001 |
| Machine ID :- 00F944014C00 |
| |
| Environment |
| +++++++++++ |
| |
| Installation Path :- /opt/IBM/mqsi/9.0.0.2 |
| Service User ID :- mqsi |
| Work Path :- /var/mqsi |
| Executable Name :- DataFlowEngine |
| Process ID :- 5505436 |
| |
| Deployment |
| ++++++++++ |
| |
| Component Name :- BROKER |
| Component UUID :- 1896f272-c833-11e4-b480-ac11699c0000 |
| Queue Manager :- QMGR |
| Execution Group :- execution.group |
| EG UUID :- a1b60d35-4c01-0000-0080-d3d407c2eed9 |
| User trace :- 0 |
| Service trace :- 0 |
| Trace size :- 0 |
| |
| Build Information |
| +++++++++++++++++ |
| |
| Backing build :- |
| Sandbox :- /build/slot1/S900_P |
| CMVC Level :- S900-FP02 |
| Build type :- Production |
| Build context :- rios_aix_4 |
| 64 Bit Build :- yes |
| |
| Failure Location |
| ++++++++++++++++ |
| |
| Time of Report (GMT) :- secs since 1/1/1970: 1476469855 |
| Message Flow :- |
| failingFlow |
| Thread ID :- 0x0000000000004143 |
| |
+-----------------------------------------------------------------------------+
abend record for pid 5505436 tid 16707 time in seconds since 01/01/1970: 1476469855
File: /build/slot1/S900_P/src/CommonServices/Unix/ImbAbend.cpp
Line: 1131
Function: signal received
---- Inserts ----
11
@(#) MQMBID sn=S900-FP02 su=_0DRGUPfNEeO0_-a3FU977g pn=CommonServices/Unix/ImbAbend.cpp]
498721728
-----------------
----------------------------- Stack dump for current thread ( 16707)
(0x1db9e440+??????] <no name available> []
(0x07459580+0x0000013c) getNextBuffer__17ImbXMLNSCServicesFPUiPPcT1 [/opt/IBM/mqsi/9.0.0.2/lib/libGenXmlParser4.a(libGenXmlParser4.a.so)]
(0x085fb380+0x0000004c) getNextBuffer__9@15@xlxpcFPiPUiPPcPv [/opt/IBM/mqsi/9.0.0.2/xlxpc/lib/libBIPNVBRT11.0.a]
(0x085ee760+0x00000098) @17@GNBDataSource_load [/opt/IBM/mqsi/9.0.0.2/xlxpc/lib/libBIPNVBRT11.0.a]
(0x08607a60+0x00000250) parseSetup__Q2_5xlxpc11XLXPCParserFb [/opt/IBM/mqsi/9.0.0.2/xlxpc/lib/libBIPNVBRT11.0.a]
(0x0744c000+0x00000188) setupXLXPParser__19ImbXMLNSCDocHandlerFv [/opt/IBM/mqsi/9.0.0.2/lib/libGenXmlParser4.a(libGenXmlParser4.a.so)]
(0x07470880+0x0000008c) parseAll__19ImbXMLNSCDocHandlerFv [/opt/IBM/mqsi/9.0.0.2/lib/libGenXmlParser4.a(libGenXmlParser4.a.so)]
(0x07474b00+0x000005e4) parseLastChild__15ImbXMLNSCParserFP16ImbSyntaxElement [/opt/IBM/mqsi/9.0.0.2/lib/libGenXmlParser4.a(libGenXmlParser4.a.so)]
(0x066d2700+0x000004dc) createLastChild__16ImbSyntaxElementFRC10ImbWstringPC20ImbDefaultPropertiesRC17ImbMessageOptionsRC21ImbBufferedStringBaseXTUcTQ2_3std11char_traitsXTUc_TUsSP24SP128_iT5N31 [/opt/IBM/mqsi/9.0.0.2/lib/libMessageServices.a(libMessageServices.a.so)]
(0x08133780+0x00000254) Java_com_ibm_broker_plugin_MbElement__1createElementAsLastChildFromBitstream [/opt/IBM/mqsi/9.0.0.2/lib/libimbjplg.a]
(0x0af81068+??????] <no name available> [/opt/IBM/mqsi/9.0.0.2/jre17/lib/ppc64/compressedrefs/libj9vm26.so]
(0x0af10ba0+0x00000000) <no name available> [/opt/IBM/mqsi/9.0.0.2/jre17/lib/ppc64/compressedrefs/libj9vm26.so]
(0x081d9800+0x00000048) CallVoidMethod__7JNIEnv_FP8_jobjectP10_jmethodIDe [/opt/IBM/mqsi/9.0.0.2/lib/libimbjplg.a]
(0x0821a700+0x000002e8) evaluate__10ImbJniNodeFRC18ImbMessageAssemblyPC19ImbDataFlowTerminal [/opt/IBM/mqsi/9.0.0.2/lib/libimbjplg.a]
(0x06709d80+0x0000021c) evaluate__19ImbDataFlowTerminalFRC18ImbMessageAssembly [/opt/IBM/mqsi/9.0.0.2/lib/libMessageServices.a(libMessageServices.a.so)]
(0x06708e80+0x0000010c) propagateInner__19ImbDataFlowTerminalFRC18ImbMessageAssemblyP19ImbDataFlowTerminal [/opt/IBM/mqsi/9.0.0.2/lib/libMessageServices.a(libMessageServices.a.so)]
(0x06c3cb00+0x0000016c) propagate__19ImbDataFlowTerminalFRC18ImbMessageAssembly [/opt/IBM/mqsi/9.0.0.2/lib/libMessageServices.a(libMessageServices.a.so)]
(0x0829db80+0x000000d0) Java_com_ibm_broker_plugin_MbOutputTerminal__1propagate [/opt/IBM/mqsi/9.0.0.2/lib/libimbjplg.a]
(0x0af8102c+??????] <no name available> [/opt/IBM/mqsi/9.0.0.2/jre17/lib/ppc64/compressedrefs/libj9vm26.so]
(0x0af0b280+0x00000000) <no name available> [/opt/IBM/mqsi/9.0.0.2/jre17/lib/ppc64/compressedrefs/libj9vm26.so]
(0x0af2c500+0x00000000) <no name available> [/opt/IBM/mqsi/9.0.0.2/jre17/lib/ppc64/compressedrefs/libj9vm26.so]
(0x0afd79a0+0x00000000) <no name available> [/opt/IBM/mqsi/9.0.0.2/jre17/lib/ppc64/compressedrefs/libj9prt26.so]
(0x0af2c560+0x00000000) <no name available> [/opt/IBM/mqsi/9.0.0.2/jre17/lib/ppc64/compressedrefs/libj9vm26.so]
(0x0af0b9c0+0x00000000) <no name available> [/opt/IBM/mqsi/9.0.0.2/jre17/lib/ppc64/compressedrefs/libj9vm26.so]
(0x0af10ba0+0x00000000) <no name available> [/opt/IBM/mqsi/9.0.0.2/jre17/lib/ppc64/compressedrefs/libj9vm26.so]
(0x081d9800+0x00000048) CallVoidMethod__7JNIEnv_FP8_jobjectP10_jmethodIDe [/opt/IBM/mqsi/9.0.0.2/lib/libimbjplg.a]
(0x0821a700+0x000002e8) evaluate__10ImbJniNodeFRC18ImbMessageAssemblyPC19ImbDataFlowTerminal [/opt/IBM/mqsi/9.0.0.2/lib/libimbjplg.a]
(0x06709d80+0x0000021c) evaluate__19ImbDataFlowTerminalFRC18ImbMessageAssembly [/opt/IBM/mqsi/9.0.0.2/lib/libMessageServices.a(libMessageServices.a.so)]
(0x06708e80+0x0000010c) propagateInner__19ImbDataFlowTerminalFRC18ImbMessageAssemblyP19ImbDataFlowTerminal [/opt/IBM/mqsi/9.0.0.2/lib/libMessageServices.a(libMessageServices.a.so)]
(0x06c3cb00+0x0000016c) propagate__19ImbDataFlowTerminalFRC18ImbMessageAssembly [/opt/IBM/mqsi/9.0.0.2/lib/libMessageServices.a(libMessageServices.a.so)]
(0x11f64a80+0x00000714) evaluate__14ImbComputeNodeFRC18ImbMessageAssemblyPC19ImbDataFlowTerminal [/opt/IBM/mqsi/9.0.0.2/lil/imbdfsql.lil]
(0x06709d80+0x0000021c) evaluate__19ImbDataFlowTerminalFRC18ImbMessageAssembly [/opt/IBM/mqsi/9.0.0.2/lib/libMessageServices.a(libMessageServices.a.so)]
(0x06708e80+0x0000010c) propagateInner__19ImbDataFlowTerminalFRC18ImbMessageAssemblyP19ImbDataFlowTerminal [/opt/IBM/mqsi/9.0.0.2/lib/libMessageServices.a(libMessageServices.a.so)]
(0x06c3cb00+0x0000016c) propagate__19ImbDataFlowTerminalFRC18ImbMessageAssembly [/opt/IBM/mqsi/9.0.0.2/lib/libMessageServices.a(libMessageServices.a.so)]
(0x0a7b4500+0x0000292c) run__18ImbCommonInputNodeFP11ImbOsThread [/opt/IBM/mqsi/9.0.0.2/lib/libMQLibrary.a(libMQLibrary.a.so)]
(0x0a7dcd80+0x00000044) run__Q2_18ImbCommonInputNode10ParametersFP11ImbOsThread [/opt/IBM/mqsi/9.0.0.2/lib/libMQLibrary.a(libMQLibrary.a.so)]
(0x05eb1500+0x0000008c) run__27ImbThreadPoolThreadFunctionFP11ImbOsThread [/opt/IBM/mqsi/9.0.0.2/lib/libCommonServices.a(libCommonServices.a.so)]
(0x05eaaa00+0x000000a8) threadRun__11ImbOsThreadFv [/opt/IBM/mqsi/9.0.0.2/lib/libCommonServices.a(libCommonServices.a.so)]
(0x05ea9e00+0x000000e0) threadBootStrap__11ImbOsThreadFPv [/opt/IBM/mqsi/9.0.0.2/lib/libCommonServices.a(libCommonServices.a.so)]
(0x00545d20+0x000000f4) _pthread_body [/usr/lib/libpthreads.a(shr_xpg5_64.o)]
(0x00000000) <invalid code address>
---------------------------------------------------------------------- |
Could this lead to a global Virtual Memory crash? Does this abend indicates some memory leak? Does this abend could be a result of a low paging space?
Thanks in advance for your comments. |
|
Back to top |
|
 |
Vitor |
Posted: Wed Oct 19, 2016 9:24 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
nelson wrote: |
Could this lead to a global Virtual Memory crash? Does this abend indicates some memory leak? Does this abend could be a result of a low paging space? |
So when you said:
nelson wrote: |
We are working on MQ7.5.0.4 |
you didn't think it was important to add:
"we're also using a version of IIB that's not up to date with maintenance that's running Lord knows what user code that could easily have a leak in it"
It's unclear if the IIB dump is a problem of its own or a result of the MQ crash bring the broker down in an untidy heap, but I would strongly suspect the reason MQ is reporting a shortage of memory is because the user code inside the IIB EG is hogging it, and that's the first place I'd start looking.
While planning to bring IIB up to date with maintenance. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
mqjeff |
Posted: Wed Oct 19, 2016 9:37 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Okay. Three things. - If the virtual machine is set up to share physical memory with other VMs on the same server, then you are subject to out of memory errors at any time.
- As my esteemed colleague said, upgrade to a supported version and fixpack.
- As my esteemed colleague says, make sure your flows aren't running in loops and hogging memory
_________________ chmod -R ugo-wx / |
|
Back to top |
|
 |
gbaddeley |
Posted: Wed Oct 19, 2016 4:19 pm Post subject: |
|
|
 Jedi Knight
Joined: 25 Mar 2003 Posts: 2538 Location: Melbourne, Australia
|
|
Back to top |
|
 |
nelson |
Posted: Thu Nov 03, 2016 5:26 am Post subject: |
|
|
 Partisan
Joined: 02 Oct 2012 Posts: 313
|
Hi all, thanks for your comments.
We got this from IBM, related to a java class that had class level variables accidentally shared across multiple instances (java variables must be declared within the evaluate() method). We are not sure, but this could led to a memory crash..
Quote: |
Accidentally sharing memory allocated to one threads processing across
multiple threads will be catastrophic to the DataFlowEngine process.
An attempt to do this for a flow with additional instances will cause
one thread to delete allocated memory from under another thread. This
leads to the type of abends that have been observed.
If both of the threads attempt to delete the same underlying memory
blocks then the second delete will randomly erase part of the heap as
the malloc blocks would not be meaningful on the same pointer the second
time around.
When dealing with MbMessageAssembly, MbMessage and MbElement objects
these are not self-contained java objects. Each has an underlying C++
pointer to the real message/tree object on the heap. So this is not just
a case of simply dereferencing a simple java object. One thread
overwriting the class level object will change all the underlying
C++ pointers the first thread was using.
With these points in mind it is strong possibility that this same
condition has led to the "traceassert" javacore that the customer has
observed. At the very least with this erroneous java there is a large
unknown in the system.
In terms of the corruption of the heap due to double deletion this would only affect the DataFlowEngine process to which these java classes were deployed and had additional instances running against them.
In the past we have seen cases where objects have accidentally been shared across threads and this has led to orphaned pointers to allocated memory and as such memory leaks could occur. If the DataFlowEngine leaked a significant amount of memory then this would cause an exhaustion in the system. However, this would just be conjecture. |
Kind regards. |
|
Back to top |
|
 |
fjb_saper |
Posted: Thu Nov 03, 2016 6:05 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
nelson wrote: |
Hi all, thanks for your comments.
We got this from IBM, related to a java class that had class level variables accidentally shared across multiple instances (java variables must be declared within the evaluate() method). We are not sure, but this could led to a memory crash..
Kind regards. |
There is nothing accidental about that. The scope of a class level variable is clear. What you must understand is that the JCN class is not an instance per message, but acts more like a singleton. The message is then a thread executing the evaluate method... It has implications on scope and data integrity when you look at variable declarations. You also need to be concerned with the thread safety of your code  _________________ MQ & Broker admin |
|
Back to top |
|
 |
|