MQSeries.net :: View topic - Mq series on sun sun cluster 2.2....need help urgent..

vvd · Posted: Thu Feb 14, 2002 11:17 am Post subject:

hi all,
i am a unix sa and do not know about MQ.. but we have a problem here.

Sun cluster 2.2 on solaris 2.8 with MQ 5.2 update 2 , weblogic

Sun cluster two node active-active configuration
MQ is installed on the local drive
the logs / queue namager is on the logical Drive.

We use the MC69 cluster script to stop/start mq weblogic.

The problem is when we do a haswitch from node 1 to node 2 the server panics..

app02 console login: Feb 14 09:30:07 app02 ID[mqm_probe_MQandWL_EQLPR1Q1]: EQLPR1Q1 is online
rootpanic: ptl1 trap reason 0x2
TL=0x1 TT=0x68 TICK=0x2032d397cc8c6
TPC=0x10117ba4 TnPC=0x10117ba8 TSTATE=0x4480001604
TL=0x2 TT=0x68 TICK=0x2032d397cc804
TPC=0x10007098 TnPC=0x1000709c TSTATE=0x9180001505

panic[cpu8]/thread=3000ca34820: Kernel panic at trap level 2

000000001040c1e0 unix:sys_tl1_panic+8 (1044a360, 4, 1, 2000, 0, 200)
%l0-3: 0000000000000004 0000000000001400 0000004480001604 000000001000723c
%l4-7: 000003000c3c5900 0000030000045500 000000000000000f 000000001040c290
000000001040c330 genunix:vmem_xalloc+12c (1044a360, 1044a768, ffffffffffffffff, 0, 0, 0)
%l0-3: 00000000104fade8 ffffffffffffe000 000000001044a360 0000000000002000
%l4-7: 0000000000000000 0000000000000000 000000001041be98 00000310095bbba0
000002a1011ce230 genunix:vmem_alloc+34 (400, 300000000, 10423748, 3ff00030010, 1041bd18, 2000900000008)
%l0-3: 0000000000000000 0000000000000000 0000000000000000 0000000000000000
%l4-7: 0000000000000000 0000000000000000 0000000000000000 000000001041bd18

..................................................................

here is the script... to start/stop mq and weblogic ...
Please go through this script or direct me to a good script.
-----------------------------------
root@app01>/opt/SUNWcluster/ha/mqm/# cat mqm_svc_stop_net_MQandWL
#!/bin/ksh
# @(#) public/solaris/sc2/mqm_svc_stop_net, supportpacks, MQHA, MQHA-010322a 1.4 01/03/16 11:44:49
#
# (C) Copyright IBM Corporation 2000
#
# MC69: Configuring MQSeries with Sun Cluster 2.x
#
# PLEASE NOTE - This script is supplied "AS IS" with no
# warranty or liability. It is not part of
# any product. Please ensure that you read
# and understand it before you run it. Make
# sure that by running it you will not
# overwrite or delete any important data.
#
# Module:
# mqm_svc_stop_net
#
# Parameters:
# none
#
# Description:
# STOP_NET method for mqm data service. This method
# stops any QMs which are hosted by logical hosts which
# are not mastered locally but whose admin filesystems
# are mounted - we are in the process of giving up these
# logical hosts, or stopping down the data service.
#
#
#----------------------------------------------------------------
#
# 24-jan-2002 (search for "MQandWL")
# stop weblogic first, then stop MQ
#----------------------------------------------------------------

Stop_QM()
{
QM=$1
USERID=$2
OFFLINE_TIMEOUT=$3

#Method_Trace ${METHODNAME}
# "Stop_QM QM=$QM USERID=$USERID OFFLINE_TIMEOUT=$OFFLINE_TIMEOUT"

# Stop the QM, if and only if it's running.
# The tag qm_${QM} relates to the strmqm (which
# should have exited a while ago) and all subprocesses, which
# include the real QM processes. We will stop them
# using an endmqm command (or more severe termination).
# We first have to tell pmf to forget about the tag - but it
# doesn't need to take any action (e.g. sending a signal).
#

${SCBIN}/pmfadm -q qm_${QM}

case $? in

0)
# QM exists here, invoke pre-offline then stop QM

# Run the rc.local script, if it exists.
# Always done in background. I know this doesn't give
# the script much time to do anything but the intention
# is not to allow it delay us - we have been told to take
# the QM offline and who are we to hang about?
# Pre-offline scripts may send mail, page someone, etc.,
# but should not rely on the QM being up for any
# length of time. It's going away, real soon now. Any clean
# up of applications that rely on the QM should be
# via a dependent data service.

if [ -x ${DSDIR}/rc.local ]
then
Method_Trace ${METHODNAME}
"run rc.local script for ${QM} pre_offline"
COMMAND="${DSDIR}/rc.local ${QM} pre_offline &"
su ${USERID} -c "${COMMAND}"
# Exit code from pre_offline script is deliberately ignored
fi

# Tell PMF to drop the tag
if ${SCBIN}/pmfadm -s qm_${QM}
then
#Method_Trace ${METHODNAME} "qm_${QM} tag dropped"
continue
else
# On any error from pmfadm here, log an error and exit.
Method_Error ${METHODNAME} "Exit code $? from pmfadm"
exit 1
fi

# Stop the QM, allowing the OFFLINE_TIMEOUT specified

### for severity in immediate preemptive terminate
for severity in preemptive terminate
do
# Issue the stop method in the background - we don't
# want to risk having it hang us up, indefinitely. We
# want to be able to run our OFFLINE_TIMEOUT timer
# concurrently to be able to give up on the attempt and
# try a more forceful version. If the kill version fails
# then there is nothing more we can do here anyway.
# (This is different to the start methods, which
# run under pmfadm so are asynchronous anyway).

Method_Trace ${METHODNAME} "Attempting ${severity} stop of ${QM}"

case $severity in

immediate)
# Minimum severity of stop is an Immediate stop
# i.e. we sever connections - cluster should not be delayed
# by clients
COMMAND="endmqm -i ${QM} &"
su ${USERID} -c "${COMMAND}"
;;

preemptive)
# This is a preemptive stop. We have already tried -i.
COMMAND="endmqm -p ${QM} &"
su ${USERID} -c "${COMMAND}"
;;

terminate)
# This is a brutal means of mopping up QM processes.
# We stop the processes in accordance with the order
# specified in Appendix E of the System Management Guide,
# except that surely the repository controller
# should go before the EC, so I have reversed those entries.

# The regular expression in the next line contains a tab character
# Edit only with tab-friendly editors.
srchstr="( |-m)$QM[ ]*.*$"
for process in amqpcsea amqhasmx amqharmx amqzllp0
amqzlaa0 runmqchi amqrrmfa amqzxma0
do
# Redirect output of kill to /dev/null in case there are no processes
ps -ef | grep $process | grep -v grep |
egrep "$srchstr" | awk '{print $2}'|
xargs kill -9 > /dev/null 2>&1
done

esac

Method_Trace ${METHODNAME}
"Waiting for ${severity} stop of ${QM} to complete"

TIMED_OUT=yes
SECONDS=0
while (( $SECONDS < ${OFFLINE_TIMEOUT} ))
do
ONLINE=`su ${USERID} -c "${DSDIR}/hamqm_running ${QM}"`
case $ONLINE in

1)
# It's still running...wait for timeout
#Method_Trace ${METHODNAME} "${QM} is still running..."
sleep 1 # loop granularity
;;

0)
# EC has updated status, but wait for
# EC to cleanup and terminate. If it
# fails to terminate inside 20 secs
# then escalate to next level of
# stop processing.
Method_Trace ${METHODNAME} "${QM} is stopping"
TIMED_OUT=yes
i=0
while (( $i < 20 ))
do
# Check for EC termination
# The regular expression in the next line contains a tab character
# Edit only with tab-friendly editors.
srchstr="( |-m)$QM[ ]*.*$"
cnt=`ps -ef | grep amqzxma0 | grep -v grep |
egrep "$srchstr" | awk '{print $2}' | wc -l `
i=`expr $i + 1`
sleep 1
if (( $cnt == 0 ))
then
# It's stopped, as desired
Method_Trace ${METHODNAME} "${QM} has stopped"
TIMED_OUT=no
break # out of while 1..10 loop
fi
done
break # out of while ..offline timeout loop
;;

*)
# Bad result from hamqm_running method
Method_Error ${METHODNAME}
"Invalid result (${ONLINE}) from hamqm_running method"
exit 1
break
;;

esac

done # mini timeout loop

if [ $TIMED_OUT = "yes" ]
then
continue # to next level of urgency
else
break # QM is stopped, job is done
fi

done # severity escalation loop
;;

1)
# We tolerate being asked to stop QMs which
# we don't know about, so this message only has
# trace status.
#Method_Trace ${METHODNAME} "No tag found for ${QM}"
continue
;;

*)
# On any other error from pmfadm here, we log an
# error message. That's all we can do.
Method_Error ${METHODNAME} "Exit code $? from pmfadm"
exit 1
;;

esac

#Method_Trace ${METHODNAME} "Stop_QM done for ${QM}"
}

# BEGIN

METHOD_STATUS=OK

METHODNAME=`basename $0`
DSDIR=`dirname $0`

# 15-Jan-2002 Ken Gottry (MQandWL)
# use special utilities file that defines the WL_TABFILE variable
if [ -r ${DSDIR}/hamqm_utilities_MQandWL ]
then
. ${DSDIR}/hamqm_utilities_MQandWL
else
SYSLOG=`haget -f syslog_facility`
logger -p ${SYSLOG}.err -t "ID[${METHODNAME}]"
"ERROR! Cannot find file hamqm_utilities_MQandWL"
exit 1
fi

Method_Trace ${METHODNAME} "mastered=<$1> notmastered=<$2> timeout=<$3>"

###----------------------------------------------------
###
### 15-Jan-2002 (MQandWL)
### need to stop WL first
###
###----------------------------------------------------

# We need to go through the logical hosts that are not mastered and
# look for ones where the admin file system is still present. These
# are the logical hosts that we are in the process of giving away. For
# these logical hosts, we must stop the WLs that they host.
for LH in ${NOT_MASTERED_LOGICAL_HOSTS}
do
Method_Trace ${METHODNAME} "Checking for WLs on logical host ${LH}"

# Find admin file system
ADMFS=`haget -f pathprefix -h ${LH}`
ADMFS_MOUNTED=`mount | grep "${ADMFS} on " | wc -l`
if [ $ADMFS_MOUNTED == 1 ]
then
if [ -r ${ADMFS}/${WL_TABFILE} ]
then
# The next line contains a tab character - use tab friendly editors
cat ${ADMFS}/${WL_TABFILE} | grep -v "^#" | grep -v "^[ ]*$" |
while read ENTRY
do
WL=`echo $ENTRY | awk '{ print $1 }'`
USERID=`echo $ENTRY | awk '{ print $2 }'`
OFFLINE_TIMEOUT=`echo $ENTRY | awk '{ print $4 }'`
# It's one of ours...stop it, if it's running...
Method_Trace ${METHODNAME} "$LH hosts ${WL}"
###----------------------------------------------------------------
###----------------------------------------------------------------
###----------------------------------------------------------------
###
###
WL_STOP=${WL}/bin/weblogic
Method_Trace ${METHODNAME} "MQandWL ... running ${WL_STOP} as user ${USERID} with a timeout of ${OFFLINE_TIMEOUT}"
${WL_STOP} stop
###
###----------------------------------------------------------------
###----------------------------------------------------------------
###----------------------------------------------------------------
# This process waits for the background tasks before it exits
done
else
# We couldn't find a tab file - this logical host may not have a WL
Method_Trace ${METHODNAME} "Could not find/read ${ADMFS}/${WL_TABFILE}"
continue # Try and service other logical hosts
fi
else
# Admin FS is not mounted - that's fine, just do the next LH...
Method_Trace ${METHODNAME} "Ignoring logical host ${LH}"
continue
fi
done

if [ $METHOD_STATUS == "OK" ]
then
# Wait for all subprocesses
wait
Method_Trace ${METHODNAME} "Successful stop of WL"
### exit 0
else
Method_Error ${METHODNAME} "Method completed with errors"
exit 1
fi

# We need to go through the logical hosts that are not mastered and
# look for ones where the admin file system is still present. These
# are the logical hosts that we are in the process of giving away. For
# these logical hosts, we must stop the QMs that they host.
for LH in ${NOT_MASTERED_LOGICAL_HOSTS}
do
#Method_Trace ${METHODNAME} "Checking for QMs on logical host ${LH}"

# Find admin file system
ADMFS=`haget -f pathprefix -h ${LH}`
ADMFS_MOUNTED=`mount | grep "${ADMFS} on " | wc -l`
if [ $ADMFS_MOUNTED == 1 ]
then
if [ -r ${ADMFS}/${TABFILE} ]
then
# The next line contains a tab character - use tab friendly editors
cat ${ADMFS}/${TABFILE} | grep -v "^#" | grep -v "^[ ]*$" |
while read ENTRY
do
QM=`echo $ENTRY | awk '{ print $1 }'`
USERID=`echo $ENTRY | awk '{ print $2 }'`
OFFLINE_TIMEOUT=`echo $ENTRY | awk '{ print $4 }'`
# It's one of ours...stop it, if it's running...
Method_Trace ${METHODNAME} "$LH hosts ${QM}"
Stop_QM ${QM} ${USERID} ${OFFLINE_TIMEOUT} &
# This process waits for the background tasks before it exits
done
else
# We couldn't find a tab file - this logical host may not have a QM
#Method_Trace ${METHODNAME} "Could not find/read ${ADMFS}/${TABFILE}"
continue # Try and service other logical hosts
fi
else
# Admin FS is not mounted - that's fine, just do the next LH...
#Method_Trace ${METHODNAME} "Ignoring logical host ${LH}"
continue
fi
done

if [ $METHOD_STATUS == "OK" ]
then
# Wait for all subprocesses, because the Stop_QM calls
# were launched in the background
wait
Method_Trace ${METHODNAME} "Successful exit"
exit 0
else
Method_Error ${METHODNAME} "Method completed with errors"
exit 1
fi
---------------------------------------------------------------------

Also here the errors

-------------------------------------------------------------------------------
02/14/02 09:31:06
AMQ6184: An internal MQSeries error has occurred on queue manager EQLPR1Q1.

EXPLANATION:
An error has been detected, and the MQSeries error recording routine has been
called. The failing process is process 19288.
ACTION:
Use the standard facilities supplied with your system to record the problem
identifier, and to save the generated output files. Contact your IBM support
center. Do not discard these files until the problem has been resolved.
-------------------------------------------------------------------------------
02/14/02 09:31:06
AMQ6162: An error has occurred reading an INI file.

EXPLANATION:
An error has occurred when reading the MQS.INI file or a queue manager QM.INI
file.
ACTION:
If you have been changing the INI file content check and correct the change. If
you have not changed the INI file, use the standard facilities supplied with
your system to record the problem identifier, and to save the generated output
files. Contact your IBM support center. Do not discard these files until the
problem has been resolved.
-------------------------------------------------------------------------------
02/14/02 09:31:06
AMQ6184: An internal MQSeries error has occurred on queue manager EQLPR1Q1.

EXPLANATION:
An error has been detected, and the MQSeries error recording routine has been
called. The failing process is process 19288.
ACTION:
Use the standard facilities supplied with your system to record the problem
identifier, and to save the generated output files. Contact your IBM support
center. Do not discard these files until the problem has been resolved.
-------------------------------------------------------------------------------