Author |
Message
|
kirank |
Posted: Sun Feb 24, 2013 4:54 pm Post subject: Large File Handling |
|
|
 Centurion
Joined: 10 Oct 2002 Posts: 136 Location: California
|
Hi,
We have a message flow that reads a file adds some Lookup values to file data and then writes a large file more than 5MB. To help performance we are reading record by record and then write the file and finish the file after last record. But we are seeing that the performance starts to degrade in writing files when the size grows. It is taking 4 hrs for size of 5MB. That is very slow for the business requirements. Is this some bug or are not setting some things properly. We are on Message Broker V 7.0.0.0
Regards
Kiran |
|
Back to top |
|
 |
fjb_saper |
Posted: Sun Feb 24, 2013 6:10 pm Post subject: Re: Large File Handling |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
kirank wrote: |
Hi,
We have a message flow that reads a file adds some Lookup values to file data and then writes a large file more than 5MB. To help performance we are reading record by record and then write the file and finish the file after last record. But we are seeing that the performance starts to degrade in writing files when the size grows. It is taking 4 hrs for size of 5MB. That is very slow for the business requirements. Is this some bug or are not setting some things properly. We are on Message Broker V 7.0.0.0
Regards
Kiran |
v7.0.0.0 is a little bit dated now. You will want to first upgrade to 7.0.0.5  _________________ MQ & Broker admin |
|
Back to top |
|
 |
kimbert |
Posted: Mon Feb 25, 2013 6:04 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
To help performance we are reading record by record and then write the file and finish the file after last record. |
Good.
Quote: |
But we are seeing that the performance starts to degrade in writing files when the size grows. It is taking 4 hrs for size of 5MB. That is very slow for the business requirements. Is this some bug or are not setting some things properly. |
How do you know that it is the FileOutput node that is taking the time? Could it be some other part of the flow? |
|
Back to top |
|
 |
kirank |
Posted: Mon Feb 25, 2013 9:14 am Post subject: |
|
|
 Centurion
Joined: 10 Oct 2002 Posts: 136 Location: California
|
We are writing a trace file where we are capturing the timestamp for each record. We can see that it writes first 10,000 records in 5 minutes however next 10,000 records take 18 minutes, the subsequent 10,000 records take 34 minutes and so forth. In total for about 50,000 records it is taking 3hrs and 40 minutes.
We will apply patches or upgrade to V8 at some point but its not an option at this point. Is there any other way to improve the performance?
Regards
Kiran |
|
Back to top |
|
 |
kirank |
Posted: Mon Feb 25, 2013 4:34 pm Post subject: |
|
|
 Centurion
Joined: 10 Oct 2002 Posts: 136 Location: California
|
After doing some additional digging I found that its not really the file node as Kimbert pointed. It is a Compute node that is taking more time. I enabled Accounting and Statistics and found that compute node was taking most of the time.
The Compute node has Loop which does lookup against Database table. There are 4 different SELECT statements for 4 different lookup values. So for 50,000 records these SELECT statements were gettting executed 200,000 times. I thought we can move these SELECTs outside of loop and do select just once and store it in Environmen. Then inside loop we can read from Environment by doing SELECT against Environment tree. However that approach is taking even longer time.
Is there any other better approach?
Regards
Kiran |
|
Back to top |
|
 |
Vitor |
Posted: Mon Feb 25, 2013 5:36 pm Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
kirank wrote: |
I thought we can move these SELECTs outside of loop and do select just once and store it in Environmen. Then inside loop we can read from Environment by doing SELECT against Environment tree. However that approach is taking even longer time.
Is there any other better approach? |
You're sure that for the 50,000 records the Environment tree remains set and you're not making the 4 SELECT calls for each of the 50,000 records and adding them to the Environment tree where previously you were just making 4 SELECT calls for each of the 50,000 records?
You might want to consider a shared variable. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
mqjeff |
Posted: Tue Feb 26, 2013 4:53 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
You can also look at doing one select for four variables instead of four selects for one variable.
You should also engage your DBA to look at optimizing the selects on the database side. |
|
Back to top |
|
 |
kimbert |
Posted: Tue Feb 26, 2013 5:33 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
I thought we can move these SELECTs outside of loop and do select just once and store it in Environment. Then inside loop we can read from Environment by doing SELECT against Environment tree. However that approach is taking even longer time. |
Why would you use an ESQL SELECT to extract the data from the Environment tree? The SELECT logic should have been done by the database.
If you have designed the system optimally then the Environment tree will contain a set of data values that are organised perfectly for the message flow. Then the message flow can just iterate over the results ( using a REFERENCE variable ) and generate the output very fast. |
|
Back to top |
|
 |
souciance |
Posted: Tue Feb 26, 2013 7:01 am Post subject: |
|
|
Disciple
Joined: 29 Jun 2010 Posts: 169
|
Wouldn't it be better to write a stored procedure/view and call that instead of including the sql code inside your compute node? I have noticed performance gain when doing this. |
|
Back to top |
|
 |
mayheminMQ |
Posted: Tue Feb 26, 2013 9:00 am Post subject: |
|
|
 Voyager
Joined: 04 Sep 2012 Posts: 77 Location: UK beyond the meadows of RocknRoll
|
Shared Row variables are better than Environment as per my personal exprience handling large amounts of cached data.
Do the selects over the row variables and probably optimising your select query itself might save you precious time.
Did you try the approach of reading the whole file into memory and running through it? Try a POC and check the timings agains tyour current flow. (Assuming you have enough memory set in your EG to handle this.) _________________ A Colorblind man may appear disadvantaged but he always sees more than just colors... |
|
Back to top |
|
 |
longng |
Posted: Tue Feb 26, 2013 10:10 am Post subject: |
|
|
Apprentice
Joined: 22 Feb 2013 Posts: 42
|
kirank wrote: |
After doing some additional digging I found that its not really the file node as Kimbert pointed. It is a Compute node that is taking more time. I enabled Accounting and Statistics and found that compute node was taking most of the time.
The Compute node has Loop which does lookup against Database table. There are 4 different SELECT statements for 4 different lookup values. So for 50,000 records these SELECT statements were gettting executed 200,000 times. I thought we can move these SELECTs outside of loop and do select just once and store it in Environmen. Then inside loop we can read from Environment by doing SELECT against Environment tree. However that approach is taking even longer time.
Is there any other better approach?
Regards
Kiran |
A most common cause for these symptoms may have to do with excessive message tree traversals inside the loop(s). It would be a good idea to convert any variables to reference variables inside loops.
Consider the following scenario:
Code: |
LOOP for 5000 times
SET InputRoot.MRM.layer1.layer2.layer3.layer4.Name = 'Jon'
SET InputRoot.MRM.layer1.layer2.layer3.layer4.Address = 'Oak street'
SET InputRoot.MRM.layer1.layer2.layer3.layer4.Phone = '123-456-7890'
...
|
The above ESQL fragment would force multiple tree traversals (InputRoot, MRM, layer1, layer2, layer3 and layer4) before arriving at a each variable for every loop iteration.
Code: |
DECLARE layer4Ref REFERENCE TO InputRoot.MRM.layer1.layer2.layer3.layer4;
LOOP for 5000 times
SET layer4Ref.Name = 'Jon'
SET layer4Ref.Address = 'Oak street'
SET layer4Ref.Phone = '123-456-7890'
|
With reference variable, the starting point is always layer4, hence it's much more efficient and it performs much better since there's no need to traverse from the root of the tree everytime.
As a matter of fact, I used this technique and was able to reduce the runtime of a flow from over two days down to 45 minutes! |
|
Back to top |
|
 |
ah.khalafallah |
Posted: Sun Mar 03, 2013 3:58 am Post subject: |
|
|
 Newbie
Joined: 03 Mar 2013 Posts: 5
|
I think you need to describe how the flow is implemented & if you are splitting the logic on 2 flows over MQ or it's a single flow scenario,
also Consider Longng's Reference Solution but you need to have a field created at the beginning before referencing it,
P.S Longng, you can't "SET InputRoot"
but anyway the example could be changed to the following
Quote: |
LOOP for 5000 times
SET OutputRoot.MRM.layer1.layer2.layer3.layer4.Name = 'Jon'
SET OutputRoot.MRM.layer1.layer2.layer3.layer4.Address = 'Oak street'
SET OutputRoot.MRM.layer1.layer2.layer3.layer4.Phone = '123-456-7890'
...
|
The above ESQL fragment would force multiple tree traversals (InputRoot, MRM, layer1, layer2, layer3 and layer4) before arriving at a each variable for every loop iteration.
Quote: |
CREATE FIELD OutputRoot.MRM.layer1.layer2.layer3.layer4;
DECLARE layer4Ref REFERENCE TO OutputRoot.MRM.layer1.layer2.layer3.layer4;
LOOP for 5000 times
SET layer4Ref.Name = 'Jon'
SET layer4Ref.Address = 'Oak street'
SET layer4Ref.Phone = '123-456-7890'
|
_________________ Middleware Developer
Certified Websphere Message Broker v.7.0
Certified Websphere Transformation Extender v.8.2 |
|
Back to top |
|
 |
longng |
Posted: Sun Mar 03, 2013 3:21 pm Post subject: |
|
|
Apprentice
Joined: 22 Feb 2013 Posts: 42
|
ah.khalafallah wrote: |
P.S Longng, you can't "SET InputRoot"
|
You're spot on, while focusing upon the technique I used the wrong type of structure for the example! |
|
Back to top |
|
 |
ah.khalafallah |
Posted: Sun Mar 03, 2013 9:40 pm Post subject: |
|
|
 Newbie
Joined: 03 Mar 2013 Posts: 5
|
No prob Longng you were pointing at a good point which I really saw what it differs.
So let's hope that this would differ
 |
|
Back to top |
|
 |
|