MQSeries.net :: View topic - Large File Handling

MQSeries.net

Tech Exchange

Education

Certifications

Library

Info Center

SupportPacs

FAQÂ Â

Usergroups

RSS Feed - WebSphere MQ Support

RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Large File Handling

Large File Handling

« View previous topic :: View next topic »

Author

Message

kirank

Posted: Sun Feb 24, 2013 4:54 pm Post subject: Large File Handling

Centurion

Joined: 10 Oct 2002
Posts: 136
Location: California

Hi,

We have a message flow that reads a file adds some Lookup values to file data and then writes a large file more than 5MB. To help performance we are reading record by record and then write the file and finish the file after last record. But we are seeing that the performance starts to degrade in writing files when the size grows. It is taking 4 hrs for size of 5MB. That is very slow for the business requirements. Is this some bug or are not setting some things properly. We are on Message Broker V 7.0.0.0

Regards

Kiran

fjb_saper

Posted: Sun Feb 24, 2013 6:10 pm Post subject: Re: Large File Handling

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20763
Location: LI,NY

kirank wrote:

v7.0.0.0 is a little bit dated now. You will want to first upgrade to 7.0.0.5

_________________
MQ & Broker admin

kimbert

Posted: Mon Feb 25, 2013 6:04 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Quote:

To help performance we are reading record by record and then write the file and finish the file after last record.

Good.

Quote:

But we are seeing that the performance starts to degrade in writing files when the size grows. It is taking 4 hrs for size of 5MB. That is very slow for the business requirements. Is this some bug or are not setting some things properly.

How do you know that it is the FileOutput node that is taking the time? Could it be some other part of the flow?

kirank

Posted: Mon Feb 25, 2013 9:14 am Post subject:

Centurion

Joined: 10 Oct 2002
Posts: 136
Location: California

We are writing a trace file where we are capturing the timestamp for each record. We can see that it writes first 10,000 records in 5 minutes however next 10,000 records take 18 minutes, the subsequent 10,000 records take 34 minutes and so forth. In total for about 50,000 records it is taking 3hrs and 40 minutes.

We will apply patches or upgrade to V8 at some point but its not an option at this point. Is there any other way to improve the performance?

Regards

Kiran

kirank

Posted: Mon Feb 25, 2013 4:34 pm Post subject:

Centurion

Joined: 10 Oct 2002
Posts: 136
Location: California

After doing some additional digging I found that its not really the file node as Kimbert pointed. It is a Compute node that is taking more time. I enabled Accounting and Statistics and found that compute node was taking most of the time.
The Compute node has Loop which does lookup against Database table. There are 4 different SELECT statements for 4 different lookup values. So for 50,000 records these SELECT statements were gettting executed 200,000 times. I thought we can move these SELECTs outside of loop and do select just once and store it in Environmen. Then inside loop we can read from Environment by doing SELECT against Environment tree. However that approach is taking even longer time.

Is there any other better approach?

Regards

Kiran

Vitor

Posted: Mon Feb 25, 2013 5:36 pm Post subject:

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

kirank wrote:

I thought we can move these SELECTs outside of loop and do select just once and store it in Environmen. Then inside loop we can read from Environment by doing SELECT against Environment tree. However that approach is taking even longer time.

Is there any other better approach?

You're sure that for the 50,000 records the Environment tree remains set and you're not making the 4 SELECT calls for each of the 50,000 records and adding them to the Environment tree where previously you were just making 4 SELECT calls for each of the 50,000 records?

You might want to consider a shared variable.
_________________
Honesty is the best policy.
Insanity is the best defence.

mqjeff

Posted: Tue Feb 26, 2013 4:53 am Post subject:

Grand Master

Joined: 25 Jun 2008
Posts: 17447

You can also look at doing one select for four variables instead of four selects for one variable.

You should also engage your DBA to look at optimizing the selects on the database side.

kimbert

Posted: Tue Feb 26, 2013 5:33 am Post subject:

Jedi Council

Joined: 29 Jul 2003
Posts: 5543
Location: Southampton

Quote:

I thought we can move these SELECTs outside of loop and do select just once and store it in Environment. Then inside loop we can read from Environment by doing SELECT against Environment tree. However that approach is taking even longer time.

Why would you use an ESQL SELECT to extract the data from the Environment tree? The SELECT logic should have been done by the database.
If you have designed the system optimally then the Environment tree will contain a set of data values that are organised perfectly for the message flow. Then the message flow can just iterate over the results ( using a REFERENCE variable ) and generate the output very fast.

souciance

Posted: Tue Feb 26, 2013 7:01 am Post subject:

Disciple

Joined: 29 Jun 2010
Posts: 169

Wouldn't it be better to write a stored procedure/view and call that instead of including the sql code inside your compute node? I have noticed performance gain when doing this.

mayheminMQ

Posted: Tue Feb 26, 2013 9:00 am Post subject:

Voyager

Joined: 04 Sep 2012
Posts: 77
Location: UK beyond the meadows of RocknRoll

Shared Row variables are better than Environment as per my personal exprience handling large amounts of cached data.

Do the selects over the row variables and probably optimising your select query itself might save you precious time.

Did you try the approach of reading the whole file into memory and running through it? Try a POC and check the timings agains tyour current flow. (Assuming you have enough memory set in your EG to handle this.)
_________________
A Colorblind man may appear disadvantaged but he always sees more than just colors...

longng

Posted: Tue Feb 26, 2013 10:10 am Post subject:

Apprentice

Joined: 22 Feb 2013
Posts: 42

kirank wrote:

A most common cause for these symptoms may have to do with excessive message tree traversals inside the loop(s). It would be a good idea to convert any variables to reference variables inside loops.

Consider the following scenario:

Code:

LOOP for 5000 times
SET InputRoot.MRM.layer1.layer2.layer3.layer4.Name = 'Jon'
SET InputRoot.MRM.layer1.layer2.layer3.layer4.Address = 'Oak street'
SET InputRoot.MRM.layer1.layer2.layer3.layer4.Phone = '123-456-7890'
...

The above ESQL fragment would force multiple tree traversals (InputRoot, MRM, layer1, layer2, layer3 and layer4) before arriving at a each variable for every loop iteration.

Code:

DECLARE layer4Ref REFERENCE TO InputRoot.MRM.layer1.layer2.layer3.layer4;

LOOP for 5000 times
SET layer4Ref.Name = 'Jon'
SET layer4Ref.Address = 'Oak street'
SET layer4Ref.Phone = '123-456-7890'

With reference variable, the starting point is always layer4, hence it's much more efficient and it performs much better since there's no need to traverse from the root of the tree everytime.

As a matter of fact, I used this technique and was able to reduce the runtime of a flow from over two days down to 45 minutes!

ah.khalafallah

Posted: Sun Mar 03, 2013 3:58 am Post subject:

Newbie

Joined: 03 Mar 2013
Posts: 5

I think you need to describe how the flow is implemented & if you are splitting the logic on 2 flows over MQ or it's a single flow scenario,

also Consider Longng's Reference Solution but you need to have a field created at the beginning before referencing it,

P.S Longng, you can't "SET InputRoot"

but anyway the example could be changed to the following

Quote:

LOOP for 5000 times
SET OutputRoot.MRM.layer1.layer2.layer3.layer4.Name = 'Jon'
SET OutputRoot.MRM.layer1.layer2.layer3.layer4.Address = 'Oak street'
SET OutputRoot.MRM.layer1.layer2.layer3.layer4.Phone = '123-456-7890'
...

The above ESQL fragment would force multiple tree traversals (InputRoot, MRM, layer1, layer2, layer3 and layer4) before arriving at a each variable for every loop iteration.

Quote:

CREATE FIELD OutputRoot.MRM.layer1.layer2.layer3.layer4;
DECLARE layer4Ref REFERENCE TO OutputRoot.MRM.layer1.layer2.layer3.layer4;

LOOP for 5000 times
SET layer4Ref.Name = 'Jon'
SET layer4Ref.Address = 'Oak street'
SET layer4Ref.Phone = '123-456-7890'

_________________
Middleware Developer
Certified Websphere Message Broker v.7.0
Certified Websphere Transformation Extender v.8.2

longng

Posted: Sun Mar 03, 2013 3:21 pm Post subject:

Apprentice

Joined: 22 Feb 2013
Posts: 42

ah.khalafallah wrote:

P.S Longng, you can't "SET InputRoot"

You're spot on, while focusing upon the technique I used the wrong type of structure for the example!

ah.khalafallah

Posted: Sun Mar 03, 2013 9:40 pm Post subject:

Newbie

Joined: 03 Mar 2013
Posts: 5

No prob Longng you were pointing at a good point which I really saw what it differs.
So let's hope that this would differ

Display posts from previous:

Page 1 of 1

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Large File Handling

Jump to:

You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum

Protected by Anti-Spam ACP