MQSeries.net :: View topic - Soliciting Architectural Feeback

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » General Discussion » Soliciting Architectural Feeback

Soliciting Architectural Feeback

« View previous topic :: View next topic »

Author

Message

klamerus

Posted: Sat Jul 02, 2005 9:52 am Post subject: Soliciting Architectural Feeback

Disciple

Joined: 05 Jul 2004
Posts: 199
Location: Detroit, MI

Hi,

I'm doing investment planning for next year and I'm considering a change to the architecture of a system we currently have in place.

I'd like to solicit some architectural guidance here to help shape that.

We have a system that does document processing. It turns SAP data into documents, manipulates documents, prints them, faxes, emails, encrpyts, etc.

Requests are made to this system from about 200 other systems here, which are free to ask for whichever particular steps they need. The average requests has four steps.

The work is managed in MQs. Custom programs making use of vendor libraries perform the work. Each pulls a message (containing the document and the instructions) off a queue, does the work, and then puts it on the queue of the next requested step. Once all the steps are done, we send a status message back to the requesting application.

Most of the documents are in the 20-50 KB range, but some are as large as 1.5 MB.

The system satisfies about 20,000 requests on the average day, but some days does as many as 50,000 (each with ~4 steps).

At this time, we allocate this work to 2 servers which have MQ installed and programs for each of the steps. The problem is that if one of these program dies, any work flowig through that step is stalled (including follow on steps).

While we have scale with 2 servers (and could go to 3 or 4), these servers each already have lots of capacity. Being queue driven, they only perform work so fast, especially for steps that send data to printers or email servers.

A "best practice" in the world of databases is to isolate the database from the code that uses it. It is better to have 2 databases on 1 box and the code from 2 different applications on a second, than it is to have 2 databases on two boxes coexisting with the code that uses it.

In other words, keep the commercial and critical stuff isolated, and put the custom stuff elsewhere.

We could modify our system so that all of the MQ queues are on the box that allocates the work to the other two and that each of the other two pull messages from that box. We would then have two of our programs working against a single shared queue. If one got stuck, the other would keep going and work would continue until we fixed the first or the second got stuck as well (we do have monitoring, so we hope to catch these quickly). In theory if we re-write our code to work with queues on a remote QM, we could actually even run several of each of them.

The downside would seem to be that the messages would need to go across the network between the servers more often, so each of the programs would be slightly slower, and impact the network.

We are wondering if there is a best practice in MQ similar to databases of separating MQ and application code on separate systems or if the thinking is that this will always perform poorly due to network or if there's some rule of thumb on bytes transferred or something that people have used.

I'm really currently leaning towards the separation, and was beginning to throw together a test program to assess what the performance and network impact would really be for our documents.

I'd appreciate the thoughts of how others are architecting systems.

jefflowrey

Posted: Sat Jul 02, 2005 12:24 pm Post subject:

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

In general, with applications that require database interaction, most people prefer to have XA to ensure that the queue processing and the database processing are atomic.

This requires either the ETC or a queue manager that is on the same machine as the program.

In addition, depending on the nature of your network, client connections can be unreliable, and the require a lot more logic to handle properly (especially in a load-balancing kind of environment).

Instead of sharing a queue between copies of your application, you should really institute MQ clustering, and workload balance multiple copies of your applications on multiple servers. You can still do this on the same machines you have - just instantiate multiple queue managers on each box. Then if one program goes down, the rest still process work. With careful use of monitoring tools, you can do things to prevent the downed app queue from receiving new work automatically (MQSC or PCF to alter it to not be in the cluster based on a trigger).

If, as you say, your servers have lots of free capacity, then adding more queue managers is not going to over work them - as long as you plan the resources that will be used properly.

You may have to modify your programs some to ensure that the location for the status message is properly identified (ensure that sending apps set the ReplyToQMgr, for example). You may also have to do some work to ensure that sending to the next step goes through proper load-balancing if this is a real concern. This can be complicated if you aren't running v6 (as I'm sure most people aren't, at least not in prod). Having an additional QM on each box as master "put" qmgr, with no local queues, is a good solution to this problem - as is a centralized router program/hub that will redirect work.

1.5Kb is not large. 10M is starting to get there.
_________________
I am *not* the model of the modern major general.

fjb_saper

Posted: Sat Jul 02, 2005 3:53 pm Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20763
Location: LI,NY

Jeff

He was talking about 1.5 MB not kB. However the comment is still valid.

We use the JCO to receive messages (IDOCS) of up to 500 kb from SAP, transform them into xml and put them to a queue. Then a little program distributes it onto a number of queues (local and remote)(pub/sub without using a broker).

The oubound side from SAP we have no problem (speed etc..)
On the inbound side we use a triggered queue with a xerces xml parser to create the idoc to send with the JCO (use additional IDOC classes...)
Our volume is far below what you indicated but we have processed up to 15,000 inbound idocs/hour running up to 15 inbound processes in parallel on the inbound queue.

Now for the load balancing advice...
Inbound to SAP is going to be RFC (MQLink-for R3 or JCO)
Inbound to SAP can be scaled by the number of processes accepted by the SAP RFC target server/server group (See SAP inbound RFC Load balancing).

You could write a "triggered every" type of application (make sure the trigint on the qmgr is low enough, we use 300,000 of 5 min) that goes to a server and checks how many are already running before deciding to instanciate another process to SAP (limiting concurrent R3 inbound connections on sender side) or just shuts down if the SAP connection cannot be obtained ( R3 inbound RFC saturated).

Make sure that the trigger (MQ) is putting the application in the background so that you do not wait until the trigger is done to process the next one. The triggered application processes the queue until empty.
In case of SAP down (comm prob) rollback and shutdown the inbound process) Move errors with the SAP Error info and the messageid in the correlation id as 2 messages to a specific error queue. It needs to be monitored (alert on qdepth) and manual intervention will be required. (change the inbound message..., move it back to process queue..., etc...)

Now for monitoring and MQ scalability:
You may have multiple inbound queues (clustering) on multiple qmgrs.
So if any qmgr should fail the cluster will load balance. Route the messages through your single point of failure (monitoring important) cluster gateway with a cluster "dummy" qmgr alias that gets translated in the gateway for load balancing.

Monitoring on the inbound queues: (if qdepth > 0 and dequeue rate == 0) for more than 1 min page somebody. This will show either a problem with triggering or with the communications to SAP or SAP being down...

As you do not have a 2 phase commit with SAP you can use any qmgr on a stand alone basis. I would however use the program to send to SAP on the qmgr that has the SAP inbound Queue. (One phase commit with the QMGR in bindings connection simulating a 2 phase commit with SAP depending on the JCO's transactional response.)

For any other considerations (outside DBs etc ) have the message processed on a qmgr that will allow a 2 phase commit with that XAResource.

Enjoy

klamerus

Posted: Sat Jul 02, 2005 5:18 pm Post subject:

Disciple

Joined: 05 Jul 2004
Posts: 199
Location: Detroit, MI

Well, let me explain a bit further (I appreciate both pieces of advice).

We have a machine we call our gateway. It has a couple of local queues to which several mainframe images write. The input isn't in exactly the form we want in the rest of the system, so we modify the format (placing an XML header in front of the rest of the message), and then put it on what's called the input queue. These are actually on 2 other servers (the ones that do the work), but are shared into the cluster with the same name. So the work is round-robin allocated to them.

With that, the majority of the queues are on each of the other two servers, which do the work.

There are a couple of issues with this. As I said, the code is a bit fragile, so we really should have two copies (or more) of each program running doing whatever the step is. This would also give us more scale.

I started thinking down this path and said to myself, why route the messages to either of the other two machines at all. If we need to modify the code so that it doesn't step on itself if we have multiple copies, why not just leave the queues on the gateway box and have just code on the other two (reading/writing to and from the queues machine-to-machine). That's when I started to think about the network.

The original messages from SAP are pretty small (invoice data), but we turn them into PDF along the way and when we print or FAX, we need to turn that PDF into PCL or PS. Those can get up into the 1.5 MB range.

Our application is entirely written in C at this time, but we'll be moving it to .Net in the future (this is all on Windows 2000 at the moment).

So, we are getting load balancing, but we would need a third or a fourth machine with input queues shared into the cluster to scale and we would still have the issue with the reliability of the code.

My problem has been that the skills of our people is just not that good. They just spent the last week playing around with the cluster because they had set it up wrong (and then wrong again, etc. ,etc.). I was thinking I could simplify things and thereby drive down support costs and issues with this plan.

It sounds like you're saying that working with MQ from one system to another (client to server) is not reliable, which concerns me. So, perhaps I had better not pursue this. I would at least like to create several copies of the same program reading messages from the same queue on a server. Are there any issues with that?

jefflowrey

Posted: Sun Jul 03, 2005 2:35 am Post subject:

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

klamerus wrote:

So, we are getting load balancing, but we would need a third or a fourth machine with input queues shared into the cluster to scale and we would still have the issue with the reliability of the code.

Again, no, you don't need a third or fourth machine - not if your current two have enough capacity to run another qmgr each.

klamerus wrote:

My problem has been that the skills of our people is just not that good. They just spent the last week playing around with the cluster because they had set it up wrong (and then wrong again, etc. ,etc.). I was thinking I could simplify things and thereby drive down support costs and issues with this plan.

I think you've gotten things as simple as they can be, from an architectural point of view.

klamerus wrote:

It sounds like you're saying that working with MQ from one system to another (client to server) is not reliable, which concerns me. So, perhaps I had better not pursue this.

It is reliable. It just takes more code to make it that way - detecting that the connection is dropped, deciding what to do when the connection is dropped, etc.

klamerus wrote:

I would at least like to create several copies of the same program reading messages from the same queue on a server. Are there any issues with that?

There aren't any issues with that as long as your code does not open for exclusive input. If the app opens for shared input, you'll be fine. This is the other way of scaling vertically (I always get these confused - so I might have it backwards) with MQ - more instances sharing a queue instead of more qmgrs on the machine. But it gets tricky to do this if you're using triggering.
_________________
I am *not* the model of the modern major general.

klamerus

Posted: Sun Jul 03, 2005 1:40 pm Post subject:

Disciple

Joined: 05 Jul 2004
Posts: 199
Location: Detroit, MI

Thanks a lot.

When I said extra machines, the issue is that the code for our current programs doing the work aren't yet designed for running several at a time. Not that the MQ part of things couldn't handle it.

klamerus

Posted: Sun Jul 03, 2005 3:46 pm Post subject:

Disciple

Joined: 05 Jul 2004
Posts: 199
Location: Detroit, MI

Actually another point to throw out and get comments about (it isn't pretty).

We aren't actually using triggering. Our code loops and with each loops checks for a message. It will process a messages (doing gets) until there are no more, then it sleeps for x seconds (a value stored in the registry) and then does another loop.

This was designed by people who didn't understand triggers or even doing a synchronous get. Sadly, they check for messages by querying against the queue depth (which we have been told can return a > 0 value even when there are no messages ready for processing).

I think triggers would work, but would require the single threadedness described above. The get based would (should) support numerous clients so long as they each stay out of each other's way.

That being said, if we have several programs getting messages from the same queue with gets, what do we have to watch out for on the mq side? We don't want to process the same message twice or have any sort of deadlock going on.

One thing we've been told is that if we do synchronous gets, MQ will ensure that each gets its own message.

jefflowrey

Posted: Sun Jul 03, 2005 4:37 pm Post subject:

Grand Poobah

Joined: 16 Oct 2002
Posts: 19981

Pretty much, if you are doing what you should be doing - open with shared, get until 2033, you don't have to worry about anything on the MQ side.

Usually, with multiple copies of the same program reading the same queue - people run up against contention with OTHER things, like databases or files, and not MQ.

It sounds like you could use a good consultant to come in for a couple of days, give some speeches, go over some sample code, and review the architecture. Give everyone a "Good job so far, but here's how you take it to the next level", who can say it with something resembling clout.

Too bad I'm not an independant consultant...

_________________
I am *not* the model of the modern major general.

fjb_saper

Posted: Sun Jul 03, 2005 5:33 pm Post subject:

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20763
Location: LI,NY

Too bad I'm not a consultant any longer either...

klamerus

Posted: Mon Jul 04, 2005 6:07 am Post subject:

Disciple

Joined: 05 Jul 2004
Posts: 199
Location: Detroit, MI

Actually, I am a consultant (non-independent), and I am trying to get these guys to the next level. My area isn't MQ though.

I am looking for pointers to some good sources of education though. I've got the few books that are out and the samples and of course, we have this system.

Are there any good CBTs or actual hands-on classes that aren't entirely wrapped around Java and/or the mainframe?

This client is a Windows shop and the current code (in C++) is going to move to .Net (don't know if VB.Net or C# yet). That's what it's been especially impossible to find.

klamerus

Posted: Mon Jul 04, 2005 6:21 am Post subject:

Disciple

Joined: 05 Jul 2004
Posts: 199
Location: Detroit, MI

Well, that last comment may have left it sounding like I'm pulling advise here and turning it into big money making work. At least some people might read it that way.

That's not the case. This is one very small system that has been around for quite some time and has been relatively well behaved. We spend about 40 hours / week (target is 32) on support for it, and do get some project dollars each year (usually < $150,000).

The customer wants to increase the use of the system, so I'm playing around with various things they might do to get there.

That being said, I do think a thorough look at how it is currently set up by an external group/person with expertise in this area might be a good thing. What would be some sources of such people that would not be ridden with carpet-baggers?

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » General Discussion » Soliciting Architectural Feeback

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP