Author |
Message
|
bruce2359 |
Posted: Sun Jan 16, 2022 8:13 am Post subject: |
|
|
Poobah
Joined: 05 Jan 2008 Posts: 9406 Location: US: west coast, almost. Otherwise, enroute.
|
an4ous wrote: |
Thx for our help
Code: |
Are your a developer? |
No, I am not developer, but tomorrow I will consult with our developers. |
One of the responsibilities of sys admins is to assist in the problem determination process. It helps if sys admins understand the development environment and the design of your organizations applications.
This may be an opportunity for you to get trained in app development. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
|
Andyh |
Posted: Sun Jan 16, 2022 10:44 am Post subject: |
|
|
Master
Joined: 29 Jul 2010 Posts: 237
|
Those counters don't suggest to me a set of applications actively polling for exclusive access to a queue. If an app fails to open the queue for exclusive input, does it sleep for a little while before retrying the open ?
If so, how long is the sleep ?
Bruce's comment "I find it highly unusual (read: alarming) the number of failed GETs and transactions rolled back. Ideally, there should be few or no roll backs. " doesn't distinguish between a real rollback, and a null rollback. When a client disconnects the server thread acting on behalf of that client will typically issue a rollback to ensure no part completed transaction is accidentally implicitly committed. Null rollback's are VERY cheap operations.
If you have strict ordering requirements it's quite difficult to achieve concurrency. If that is the case you might want to look at why the IO latency is as high as 2-2.5 ms (which is very poor for a local SSD). Even with a 2.5 ms latency you might have hoped for a strongly serialized getter to achieve close to 200 persistent msg/sec.
The amqsrua output you've provided was collected serially, and as it doesn't show a very steady state it's hard to have much confidence in relating the numbers across the different time intervals. |
|
Back to top |
|
|
fjb_saper |
Posted: Sun Jan 16, 2022 10:51 pm Post subject: |
|
|
Grand High Poobah
Joined: 18 Nov 2003 Posts: 20698 Location: LI,NY
|
Finally having the queue set to input shared is no guarantee that it will be use like this. The application can override that behavior trying to do an input exclusive... You should also look at alias queues that your Camel consumer might use...
Looks to me that the bottle neck is in the consuming application. Removing a msg from the queue should take about 4 milliseconds. That gives you an ideal consumption rate of 250 msgs per second, allowing for some processing time, say 200 msg per second. I suggest you verify the HTTP end of the deal and make sure HTTP 1.1 is being used, keeping the connection open....
Obviously a design that has message affinity is pure nonsense here as you would need 2 to 3 threads to handle the incoming volume. You may need to review the design, or review the design of your Splunk queries. You may just need to use the put date and put time when passing the content to Splunk if there is no timestamp in the message body... _________________ MQ & Broker admin |
|
Back to top |
|
|
an4ous |
Posted: Mon Jan 17, 2022 12:25 am Post subject: |
|
|
Apprentice
Joined: 14 Jan 2022 Posts: 38
|
We also discovered tahr from 13 January getting speed from IBM MQ SPLUNK.Q has decreased from ~ 150 mes/sec to ~ 75 mes/sec and at the same time swap partition (virtual memory) has begun actively using while system RAM mostly busy disk's cache.
We temporary disabled swap and extend RAM and getting speed has returned to 150 mes/sec (and log write latency decreased from 2-2,5 ms to 1.5-2 ms ) but lgobaly globally this has not helped us becouse to queue put about from 250 to 500 mes/sec
But IBM recommends have got swap https://www.ibm.com/docs/en/ibm-mq/9.1?topic=linux-configuring-tuning-operating-system
and not indicates special swap kernel tuning parameters (vfs_cache_pressure, swappiness, etc)
Can you any suggest about swap using of IBM MQ? |
|
Back to top |
|
|
bruce2359 |
Posted: Mon Jan 17, 2022 5:42 am Post subject: |
|
|
Poobah
Joined: 05 Jan 2008 Posts: 9406 Location: US: west coast, almost. Otherwise, enroute.
|
Run Amqsrua again. Select the option to view qmgr RAM/virtual storage utilization. Post results here, _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
|
an4ous |
Posted: Mon Jan 17, 2022 5:52 am Post subject: |
|
|
Apprentice
Joined: 14 Jan 2022 Posts: 38
|
In my installation for amqsrua sample program there no RAM/virtual storage utilization option, only
Code: |
CPU : Platform central processing units
DISK : Platform persistent data stores
STATMQI : API usage statistics
STATQ : API per-queue usage statistics
Enter Class selection
|
availible |
|
Back to top |
|
|
bruce2359 |
Posted: Mon Jan 17, 2022 7:23 am Post subject: |
|
|
Poobah
Joined: 05 Jan 2008 Posts: 9406 Location: US: west coast, almost. Otherwise, enroute.
|
|
Back to top |
|
|
Andyh |
Posted: Mon Jan 17, 2022 8:32 am Post subject: |
|
|
Master
Joined: 29 Jul 2010 Posts: 237
|
The amqsrua output for CPU includes some very basic memory stats.
In general, looking at OS level stats through amqsrua isn't recommended (except on the appliance). Running native OS tools like ps and vmstat will give you better insight into memory usage on the box (on the appliance you don't have access to such tools). |
|
Back to top |
|
|
bruce2359 |
Posted: Mon Jan 17, 2022 9:31 am Post subject: |
|
|
Poobah
Joined: 05 Jan 2008 Posts: 9406 Location: US: west coast, almost. Otherwise, enroute.
|
Appears to be insufficient RAM as root cause (problem source) - so far. A quick and inexpensive resolution, and once again not MQ's fault. _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
|
hughson |
Posted: Mon Jan 17, 2022 8:07 pm Post subject: |
|
|
Padawan
Joined: 09 May 2013 Posts: 1916 Location: Bay of Plenty, New Zealand
|
an4ous wrote: |
In my installation for amqsrua sample program there no RAM/virtual storage utilization option, only
Code: |
CPU : Platform central processing units
DISK : Platform persistent data stores
STATMQI : API usage statistics
STATQ : API per-queue usage statistics
Enter Class selection
|
availible |
It's part of the CPU section. Read Metrics published on the system topics
e.g.
Code: |
CPU : Platform central processing units
DISK : Platform persistent data stores
STATMQI : API usage statistics
STATQ : API per-queue usage statistics
STATAPP : Per-application usage statistics
Enter Class selection
==> CPU
SystemSummary : CPU performance - platform wide
QMgrSummary : CPU performance - running queue manager
Enter Type selection
==> QMgrSummary
Publication received PutDate:20220118 PutTime:04024428 Interval:4.455 seconds
User CPU time - percentage estimate for queue manager 0.00%
System CPU time - percentage estimate for queue manager 0.05%
RAM total bytes - estimate for queue manager 154MB |
_________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
|
an4ous |
Posted: Mon Feb 14, 2022 1:36 am Post subject: |
|
|
Apprentice
Joined: 14 Jan 2022 Posts: 38
|
Hello, Sorry for my absence.
I carefully read this document https://ibm-messaging.github.io/mqperf/mqio_v1.pdf
This document makes the following recommendations:
• Only use persistent messaging when necessary for your application. - I don't decide which applications should use persistent messages and which should not
• Size your transaction log correctly. - I have reduced transaction logs count with this article https://www.ibm.com/docs/en/ibm-mq/9.1?topic=csl-how-large-should-i-make-my-active-log
• Concurrency optimizes the MQ logger throughput. - From problem queue messages reads with parallel mode
• Use syncpoint with persistent messages (even if there is only one MQPUT in your
transaction). - We use syncpoint, also ImplSyncOpenOutput option must add implicit syncpoints.
• Set LogBufferPages=4096 in qm.ini - I have set LogBufferPages to 4096
• Host a queue manager’s error logs on a different location to the transaction log, to
avoid being unable to write errors. - We use VM, on hypervisor host use ssd, we have not created separate partition for mq transaction logs but this should not affect to perfomance
• Avoid Long Running Transactions - Early we have got "Long Running Transactions was detected" errors but now there are no same errors
• Establish infrastructure capabilities outside of MQ to better understand possible
bottlenecks - we have checked outside infrastructure resourses - network and clients aplications resourses are not bottlenecks
• Spread message load across multiple queue where possible, to alleviate queue
locking. - queue locking for problem queue aroud 15% - it is within the normal range
• Understand your application and requirements. - Our developers want replace ibm mq to apache kafka as as solution to problem but it takes time
I have tried temporary increase cpu cores count on VM - increase cpu cores count has not helped.
Also I have set SHARECNV to 1 (default 10) but this had no effect (only tcp connections count became more)
But after restart IBM MQ Queue manager (for change cpu core counets and change qm.ini) getting speed from problem queue up to ~275 messages per second, but after few days uptime getting speed go down less then 150 messages per second. - Why IBM MQ can not keep getting speed on one level?
Also in /var/mqm/erros logs I see xecL_W_LONG_LOCK_WAIT/AMQ6150W: IBM MQ semaphore is busy error - what do these errors mean (I have read https://www.ibm.com/support/pages/mq-amq6150w-semaphore-busy-xeclwlonglockwait-it-could-be-slow-performance-or-hang but maybe you can say something more?) |
|
Back to top |
|
|
hughson |
Posted: Mon Feb 14, 2022 1:43 am Post subject: |
|
|
Padawan
Joined: 09 May 2013 Posts: 1916 Location: Bay of Plenty, New Zealand
|
an4ous wrote: |
Avoid Long Running Transactions - Early we have got "Long Running Transactions was detected" errors but now there are no same errors |
What did you change to remove your long running transactions?
Are you certain you do not have any long running transactions anymore. Have you looked?
This blog post: IBM MQ Little Gem #19: DISPLAY CONN dates & times shows the commands to use to see the start times of your transactions. See how long they have been running for. What are the oldest times you see?
Code: |
DISPLAY CONN(*) ALL WHERE(UOWLOGTI NE ' ') |
Cheers,
Morag _________________ Morag Hughson @MoragHughson
IBM MQ Technical Education Specialist
Get your IBM MQ training here!
MQGem Software |
|
Back to top |
|
|
an4ous |
Posted: Mon Feb 14, 2022 2:27 am Post subject: |
|
|
Apprentice
Joined: 14 Jan 2022 Posts: 38
|
In MQ QM logs last records
"AMQ7469I: Transactions rolled back to release log space
AMQ7486I: Transaction was preventing log space from being released.
long running transaction was detected/The log space for the queue manager is becoming full" 12/14/2021 (two month ago)
There are no new errors
Also I watch dspmqtrn - time of current transactions around 2 minutes (our developers set max transcation time 180s)
DISPLAY CONN(*) ALL WHERE(UOWLOGTI NE ' ') also not shows long transactions
So now long transactions are not reason low perfomance |
|
Back to top |
|
|
bruce2359 |
Posted: Mon Feb 14, 2022 4:11 am Post subject: |
|
|
Poobah
Joined: 05 Jan 2008 Posts: 9406 Location: US: west coast, almost. Otherwise, enroute.
|
Use any file editor to view the mq.ini file for this qmgr. Post all of its contents here.
Linear logging? Circular logging?
If circular, how many primary? Secondary? _________________ I like deadlines. I like to wave as they pass by.
ב''ה
Lex Orandi, Lex Credendi, Lex Vivendi. As we Worship, So we Believe, So we Live. |
|
Back to top |
|
|
an4ous |
Posted: Mon Feb 14, 2022 5:04 am Post subject: |
|
|
Apprentice
Joined: 14 Jan 2022 Posts: 38
|
My current qm.ini config
Code: |
cat qm.ini
#*******************************************************************#
#* Module Name: qm.ini *#
#* Type : IBM MQ queue manager configuration file *#
# Function : Define the configuration of a single queue manager *#
#* *#
#*******************************************************************#
#* Notes : *#
#* 1) This file defines the configuration of the queue manager *#
#* *#
#*******************************************************************#
ExitPath:
ExitsDefaultPath=/var/mqm/exits
ExitsDefaultPath64=/var/mqm/exits64
#* *#
#* *#
Log:
LogPrimaryFiles=33
LogSecondaryFiles=3
LogFilePages=65535
LogType=CIRCULAR
LogBufferPages=4096
LogPath=/WMQ/mgr_prod/log/MGR_PROD/
LogWriteIntegrity=TripleWrite
Service:
Name=AuthorizationService
EntryPoints=14
ServiceComponent:
Service=AuthorizationService
Name=MQSeries.UNIX.auth.service
Module=amqzfu
ComponentDataSize=0
Channels:
ChlauthEarlyAdopt=Y
MaxChannels=9999
TCP:
SndBuffSize=0
RcvBuffSize=0
RcvSndBuffSize=0
RcvRcvBuffSize=0
ClntSndBuffSize=0
ClntRcvBuffSize=0
SvrSndBuffSize=0
SvrRcvBuffSize=0
KeepAlive=yes
|
I also have checked my kernel parameters
Code: |
cat /etc/sysctl.conf
# sysctl settings are defined through files in
# /usr/lib/sysctl.d/, /run/sysctl.d/, and /etc/sysctl.d/.
#
# Vendors settings live in /usr/lib/sysctl.d/.
# To override a whole file, create a new file with the same in
# /etc/sysctl.d/ and put new settings there. To override
# only specific settings, add a file with a lexically later
# name in /etc/sysctl.d/ and put new settings there.
#
# For more information, see sysctl.conf(5) and sysctl.d(5).
kernel.shmmni=4096
kernel.shmall=2097152
kernel.shmmax=268435456
kernel.sem=32 4096 32 128
kernel.threads-max=32768
kernel.pid_max=32768
fs.file-max=524288
net.ipv4.tcp_keepalive_time=600
net.ipv4.tcp_keepalive_intvl=10
net.ipv4.tcp_keepalive_probes=6
vm.swappiness=1
|
and limit values
Code: |
cat /etc/systemd/system/mqm@.service
[Unit]
Description=IBM MQ V9.1 queue manager %I
After=network.target
[Service]
ExecStart=/opt/mqm/bin/strmqm %I
ExecStop=/opt/mqm/bin/endmqm -i %I
Type=forking
User=mqm
Group=mqm
KillMode=none
LimitNOFILE=20480
LimitNPROC=8192
TimeoutSec=1200
[Install]
WantedBy=multi-user.target
|
Code: |
tail -5 /etc/security/limits.conf
mqm hard nofile 40960
mqm soft nofile 40960
mqm hard nproc 16384
mqm soft nproc 16384
|
So my current kernel parametrs very conservative as in https://www.ibm.com/docs/en/ibm-mq/9.1?topic=linux-configuring-tuning-operating-system
I read article https://www.ibm.com/docs/en/ibm-mq/9.1?topic=windows-resource-problems and have checked current usage system resources
Code: |
ps -eLf|egrep "amq|run"|wc -l
4258
|
Code: |
echo "dis conn(*) all" | runmqsc MGR_PROD |grep EXTCONN|wc -l
1988
|
So I have around 4000 mq processes and 2000 mq connection in my system
Code: |
ipcs -ma
------ Message Queues --------
key msqid owner perms used-bytes messages
------ Shared Memory Segments --------
key shmid owner perms bytes nattch status
0x00000000 32768 mqm 600 16384 99
0x00000000 32769 mqm 600 20480 99
0x00000000 32770 mqm 600 16207872 93
0x00000000 32771 mqm 600 5984256 102
0x00000000 32772 mqm 600 2555904 81
0x00000000 32773 mqm 600 5099520 81
0x00000000 32774 mqm 600 10186752 81
0x00000000 32775 mqm 600 20361216 81
0x00000000 32776 mqm 600 40710144 81
0x00000000 32777 mqm 600 81408000 81
0x00000000 32778 mqm 600 162803712 81
0x00000000 32779 mqm 600 162799616 81
0x00000000 32780 mqm 600 162795520 81
0x00000000 32781 mqm 600 162791424 81
0x00000000 32782 mqm 666 5984256 183
0x00000000 32783 mqm 666 81219584 180
0x00000000 32784 mqm 666 162426880 180
0x00000000 42 mqm 600 14733312 99
0x00000000 43 mqm 600 1622016 99
0x00000000 44 mqm 666 5443584 201
0x00000000 45 mqm 600 5443584 111
0x00000000 46 mqm 666 9244672 201
0x00000000 47 mqm 666 190058496 201
0x00000000 48 mqm 600 565248 99
0x00000000 49 mqm 644 5443584 102
0x00000000 50 mqm 600 102551552 99
0x00000000 51 mqm 600 1171456 99
0x00000000 52 mqm 600 92090368 99
0x00000000 53 mqm 600 90112 99
0x00000000 54 mqm 600 8474624 99
0x00000000 55 mqm 600 268435456 99
0x00000000 56 mqm 600 10485760 99
0x00000000 57 mqm 600 1171456 99
0x00000000 58 mqm 600 4628480 99
|
I see that for shmid 55 current usage bytes is 268435456 as is max kernel limit for shmmax. Is that means that current system resources (shmmax) for our IBM installation have reached the limit and require of increase?
In IBM MQ perfomance report https://ibm-messaging.github.io/mqperf/MQ_for_xLinux_V910_Performance.pdf
kernel.shmmni = 8192
kernel.shmall = 4294967296
kernel.shmmax = 137438953472
more than 2-200 times more than in https://www.ibm.com/docs/en/ibm-mq/9.1?topic=linux-configuring-tuning-operating-system and in my current system
I want set kernel parametrs to
kernel.shmmni=8192
kernel.shmall=20971520
kernel.shmmax=2684354560
kernel.sem="320 40960 320 1280"
kernel.threads-max=655360
kernel.pid_max=655360
fs.file-max=5242880
is it justified? |
|
Back to top |
|
|
|