ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » Clustering » Solved: Rebuilding corrupt cluster repository (full)

Post new topic  Reply to topic
 Solved: Rebuilding corrupt cluster repository (full) « View previous topic :: View next topic » 
Author Message
chrisc
PostPosted: Thu Jul 10, 2008 7:44 pm    Post subject: Solved: Rebuilding corrupt cluster repository (full) Reply with quote

Voyager

Joined: 19 Mar 2006
Posts: 77

Hi everyone,

We have an MQ 6.0.2 cluster with two full repositories and a number of partial repositories. When I came onto the project, I discovered that apparently one of the full repository QMs has become corrupted, and I have been told that they have no way of restoring it from backups.

It is not possible to connect to the queue manager or anything, obviously.

What we want to do is to rebuild (i.e. delete and recreate it), but I'm concerned about whether or not it's safe to do this.

My idea was:
- On the working FR, stop and delete the cluster sender channels that point to the corrupt FR.
- Check all the other QMs in the cluster, and if they have a cluster sender channel that points to the corrupt FR, stop and delete it (replacing with one that points to the working one if necessary)
- Delete the corrupt QM using dltmqm
- Recreate the QM
- Re-add the necessary sender and receiver cluster channels to the full repositories.

Alternatively, would the "delqm" utility in the ms0g fixpack do the same thing - presumably safely?

Would this work? Are there any gotchas I need to be aware of here? Anyone had experience with this sort of thing before?


Thanks,
Chris


Last edited by chrisc on Sun Jul 20, 2008 9:12 pm; edited 1 time in total
Back to top
View user's profile Send private message
sami.stormrage
PostPosted: Sat Jul 12, 2008 11:00 am    Post subject: Reply with quote

Disciple

Joined: 25 Jun 2008
Posts: 186
Location: Bangalore/Singapore

Qmgr cluster's doc. chapter no. 9 should help u in doing so. But before jumping to any conclusions please verify if the Qmgr is not accidently suspended or connection name error to the other FR's. Btw, what all things did u try to confirm that the Qmgr is corrupted?
_________________
*forgetting everything *
Back to top
View user's profile Send private message Yahoo Messenger
exerk
PostPosted: Sat Jul 12, 2008 12:08 pm    Post subject: Reply with quote

Jedi Council

Joined: 02 Nov 2006
Posts: 6339

Consideration should also be given to removing references to the 'old' Full Repository, i.e. issuing a RESET CLUSTER command including the QMID.
_________________
It's puzzling, I don't think I've ever seen anything quite like this before...and it's hard to soar like an eagle when you're surrounded by turkeys.
Back to top
View user's profile Send private message
senMQ
PostPosted: Sat Jul 12, 2008 1:50 pm    Post subject: Reply with quote

Acolyte

Joined: 14 Aug 2006
Posts: 66
Location: Palo Alto, CA

After you delete and recreate the queue manager make sure that the old repository queue manager is out of cluster. If it is in the cluster still, you run should the "reset cluster" command.
Back to top
View user's profile Send private message Send e-mail
chrisc
PostPosted: Tue Jul 15, 2008 3:46 pm    Post subject: Reply with quote

Voyager

Joined: 19 Mar 2006
Posts: 77

Thanks for the feedback.

We know the queue manager is dead because it seems to have lost one of its log files. It was apparently lost or corrupted by the backup job they run here (which obviously needs further investigation). When we use
Code:
strmqm qmgr

it says log not found.

I haven't had a chance to deal with this yet, but hopefully I will get to it in the next day or so... The MQ environment here is - ahem - "fragile" at the moment, and needs some attention.

I'll let you know how I go with it.[/code]
Back to top
View user's profile Send private message
veech23
PostPosted: Tue Jul 15, 2008 4:01 pm    Post subject: Reply with quote

Novice

Joined: 25 Apr 2007
Posts: 23
Location: canberra

May be you can avoid lot of work

http://mqseries.net/phpBB2/viewtopic.php?t=41780&highlight=&sid=2fd7fc6e66aa20d0a9485352849966b9
Back to top
View user's profile Send private message
chrisc
PostPosted: Tue Jul 15, 2008 8:34 pm    Post subject: Reply with quote

Voyager

Joined: 19 Mar 2006
Posts: 77

Aha, thanks for that link, I'll give it a shot. Certainly sounds a lot easier than some of the other stuff, even if it is "hacking MQ".
Back to top
View user's profile Send private message
chrisc
PostPosted: Sun Jul 20, 2008 3:53 pm    Post subject: Reply with quote

Voyager

Joined: 19 Mar 2006
Posts: 77

Hmmm, doesn't seem to have worked.

It looked like it did initially, because our startup script didn't show any errors:
Code:

strmqm myQM
sleep 10

ps -ef | grep "/opt/mqm/bin/runmqchi -m myQM" | grep -v grep
if (( $? == 1 ))
then
  echo "runmqchi was not running..."
  echo "Now starting runmqchi -m myQM..."
  nohup runmqchi -m mqQM > /dev/null &
fi

sleep 10
echo "Now starting strmqcsv myQM..."
nohup strmqcsv myQM > /dev/null &
sleep 10
echo "QM start completed."


However, if I do a ps -ef looking for my queue manager, it only comes up with amqrmppa.

Also, on subsequent attempts to start up, I am getting the "Log not available." error again, which is back to where we started. I'm wondering if something is corrupted in the QM and that is causing the log files to disappear / become corrupted?

There are two FDCs generated:

    AMQ6125 internal MQ error (component mqlpgrlg, major errorcode 'hrcE_MQLO_UNEXPECTED_OS_ERROR', minor errorcode "OK')
    AMQ6120 internal MQ error (component xcsRequestMutexSem, major errorcode 'xecF_E_UNEXPECTED_SWITCH', minor errorcode 'OK')


Does anyone understand any of that?

Am I getting to the point of just deleting the whole QM and starting again?
Back to top
View user's profile Send private message
sami.stormrage
PostPosted: Sun Jul 20, 2008 8:01 pm    Post subject: Reply with quote

Disciple

Joined: 25 Jun 2008
Posts: 186
Location: Bangalore/Singapore

wheres the Channel Init Q when u do a runmqchi??
_________________
*forgetting everything *
Back to top
View user's profile Send private message Yahoo Messenger
sami.stormrage
PostPosted: Sun Jul 20, 2008 8:06 pm    Post subject: Reply with quote

Disciple

Joined: 25 Jun 2008
Posts: 186
Location: Bangalore/Singapore

ok.. it will be defaulted to SYSTEM.CHANNEL.INITQ, but best practice would be to have your own Init Q created.. just to be on the safer side..
_________________
*forgetting everything *
Back to top
View user's profile Send private message Yahoo Messenger
chrisc
PostPosted: Sun Jul 20, 2008 8:11 pm    Post subject: Reply with quote

Voyager

Joined: 19 Mar 2006
Posts: 77

sami.stormrage wrote:
wheres the Channel Init Q when u do a runmqchi??


It's not specified, i.e. it uses the default SYSTEM.CHANNEL.INITQ.

Edit: sorry, looks like you were typing at the same time as me! Anyway, I can't do much about using the default right now because I can't start the queue manager to create an alternative!

Just out of interest, why do you think it's safer to create a different one?
Back to top
View user's profile Send private message
chrisc
PostPosted: Sun Jul 20, 2008 9:20 pm    Post subject: SOLVED! Reply with quote

Voyager

Joined: 19 Mar 2006
Posts: 77

Woohoo! It's often the simplest little things...

OK, all sorted. The process as described in the linked posting DOES work, with one caveat: The new log files must be owned by mqm!

The login we have is in the mqm group, but was not actually the mqm user itself. When I created the temporary queue manager under this user ID, it was being assigned the login ID (as you'd expect) but for some reason it wouldn't start if it wasn't mqm.

So, moral of the story - either do the steps outlined in the post while logged in as mqm, or:
Code:

chown -R mqm:mqm /var/mqm/log/TEMP
chown mqm:mqm /var/mqm/qmgrs/BROKENQM/amqalchk.fil


Thanks to everybody for your help on this, and I hope my little discovery helps other people if they come across this. I just wish IBM came up with "permission denied" errors or something rather than "unexpected OS errors" or "internal errors"!
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Page 1 of 1

MQSeries.net Forum Index » Clustering » Solved: Rebuilding corrupt cluster repository (full)
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.