Author |
Message
|
vallu |
Posted: Thu Mar 11, 2004 6:45 pm Post subject: Transform 10MB messages from CSV to Fixed Width. |
|
|
Apprentice
Joined: 29 Jun 2002 Posts: 31
|
We have a requirement to transform 10MB CSV files into Fixed width. But, a 1MB of message takes 50 Minutes for the transformation. A 50K messages takes 1 second for transformation. What options do i have to accomplish my task of transforming 10MB file?
We are seriously considering using 50K messages (splitting larger messages into logical groups of 50K each).
We are at CSD5 of MQSI 2.1 on Win 2K machine (2GB RAM).
Please suggest. |
|
Back to top |
|
 |
jefflowrey |
Posted: Thu Mar 11, 2004 7:08 pm Post subject: |
|
|
Grand Poobah
Joined: 16 Oct 2002 Posts: 19981
|
There's a support pack, IP04 , about how to design your message flows for performance.
Start by reviewing that to see if there are ways you can improve your code or your model. _________________ I am *not* the model of the modern major general. |
|
Back to top |
|
 |
kirani |
Posted: Thu Mar 11, 2004 11:14 pm Post subject: |
|
|
Jedi Knight
Joined: 05 Sep 2001 Posts: 3779 Location: Torrance, CA, USA
|
Can you tell us more about your input message layout? Are you trying to send a complete file into 1 message? I believe your file consist of multiple records. What is your output message format?
You can significantly improve the performance by using REFERENCE data type within your ESQL. Please take a look at ESQL reference manual along with the material suggested by Jeff. _________________ Kiran
IBM Cert. Solution Designer & System Administrator - WBIMB V5
IBM Cert. Solutions Expert - WMQI
IBM Cert. Specialist - WMQI, MQSeries
IBM Cert. Developer - MQSeries
|
|
Back to top |
|
 |
vallu |
Posted: Fri Mar 12, 2004 3:30 am Post subject: |
|
|
Apprentice
Joined: 29 Jun 2002 Posts: 31
|
kirani wrote: |
Can you tell us more about your input message layout? Are you trying to send a complete file into 1 message? I believe your file consist of multiple records. What is your output message format?
You can significantly improve the performance by using REFERENCE data type within your ESQL. Please take a look at ESQL reference manual along with the material suggested by Jeff. |
Thanks for the reply.
We have an input file which is CSV. We want an output file in FIXED WIDTH. We have multiple records.
40,000 repeating records making upto 10MB in data. I would think WMQI cannot transform any messages bigger than 50KB in an efficient manner(in short time). I have had a look at the material suggested by Jeff sometime back. Not much for me. Performance figures by IBM do not show any figure above 60KB (for obvious reasons..)
We are planning to approach the problem using logical grouping of messages. Message sequence is not important for us. But all messages should reach the destination. We can handle this programmatically and WMQI supports grouping.
Has somebody done this before?
Sample CSV Data:
,0280,021860,0,,0,,,, , , , ,0,0,0,0,0, ,1,
Sample fixed width
002 B024895020 0 1 0 2004 1 310A7DB 205/55 R16 90Q DRICE TL PC/LT 0 0 0 0 0YES 1 9.74 |
|
Back to top |
|
 |
kirani |
Posted: Fri Mar 12, 2004 2:19 pm Post subject: |
|
|
Jedi Knight
Joined: 05 Sep 2001 Posts: 3779 Location: Torrance, CA, USA
|
You can still work with single input message. Model your input message using TDS and set the record element to Repeating. Within your message flow you need to loop thru input records and then transform them to output CWF format.
Please note that CWF can take only fixed number of occurs or occurs depending on some numeric variable, so model your message accordingly.
Here is the sample code (not tested).
Code: |
DECLARE inref REFERENCE to InputRoot.MRM.MyRecord[1];
SET OutputRoot.MRM.TotalRecs = 0;
CREATE FIELD OutputRoot.MRM.OUTREC;
DECLARE outref REFERENCE to OutputRoot.MRM.OUTREC;
DECLARE CNT INT 1;
while (LASTMOVE(inref) = TRUE ) DO
SET outref.Fld1 = inref.Fld1;
SET outref.Fld2 = inref.Fld2;
...
move inref NEXTSIBLING;
CREATE NEXTSIBLING on outref as outref NAME 'OUTREC';
SET CNT = CNT + 1;
END WHILE;
SET OutputRoot.MRM.TotalRecs = CNT - 1;
detach outref;
|
If you machine does not have enough memory then it's a good idea to split the input message into smaller parts and then do the transformation. The point I am trying to make here is to Use References to loop thru the input/output tree to get better performance.
Hope this helps. _________________ Kiran
IBM Cert. Solution Designer & System Administrator - WBIMB V5
IBM Cert. Solutions Expert - WMQI
IBM Cert. Specialist - WMQI, MQSeries
IBM Cert. Developer - MQSeries
|
|
Back to top |
|
 |
fitzcaraldo |
Posted: Sat Mar 13, 2004 3:58 am Post subject: |
|
|
Voyager
Joined: 05 May 2003 Posts: 98
|
This really gets down to whether the message must be processed as a single unit of work (ie all 40000 records or none).
If not, you may be able to split the message into 40000 separate ones and process them individually. Does the target application require one large message or can it handle 40000 small ones?
If it requires one large message in a single unit of work, you can do things like splitting and adding an RFH2 to each message with a sequence number and then have a flow that reassembles them into one big message and checking for omissions. Or use the MQ grouping you mention.
To handle a single 10MB message you would want to keep the parsing and number of compute nodes to a minimum. |
|
Back to top |
|
 |
vallu |
Posted: Sun Mar 14, 2004 4:41 pm Post subject: |
|
|
Apprentice
Joined: 29 Jun 2002 Posts: 31
|
Thanks all of you.
I shall try using reference. But I do not think our machine has enough memory to handle 10MB message. In such case, i shall try splitting.
Thanks again |
|
Back to top |
|
 |
surenat |
Posted: Mon Mar 15, 2004 2:31 pm Post subject: |
|
|
Apprentice
Joined: 01 Jan 2002 Posts: 32
|
Hi Vallu:
I had experienced the same problem. I had a situation where I need to convert 1MB CSV message to XML. Using normal plain ESQL coding, it took 10 mins to process one message. After that I altered the code using REFERENCE concept and removing CARDINALITY from while loop condition check, it processed in 2 mins. But still, 2 mins is too much time. I opened PMR with IBM...nothing helpful. I ended up writting java plugin and process time reduced to 10 secs per message! _________________ IBM Certified Specialist MQSeries
IBM Certified Specialist - Websphere MQ Integrator |
|
Back to top |
|
 |
vallu |
Posted: Mon Mar 15, 2004 7:32 pm Post subject: |
|
|
Apprentice
Joined: 29 Jun 2002 Posts: 31
|
I have noticed that using REFERENCE is much faster. So, you wrote your own plugin. Does it transform CSV to XML? How does it do this? |
|
Back to top |
|
 |
surenat |
Posted: Tue Mar 16, 2004 7:34 am Post subject: |
|
|
Apprentice
Joined: 01 Jan 2002 Posts: 32
|
In summary, I wrote some java classes to load CSV into data strcuture(object), and then mapped the data to XML (DOM parser for Java). Integrated this parsing classes with main Java-pluin class.
Let me know if you need detailed frame work, first make sure your client agree to use java plug-ins insdie MQSI. _________________ IBM Certified Specialist MQSeries
IBM Certified Specialist - Websphere MQ Integrator |
|
Back to top |
|
 |
vallu |
Posted: Tue Mar 16, 2004 11:33 pm Post subject: |
|
|
Apprentice
Joined: 29 Jun 2002 Posts: 31
|
Hi Surenat,
Please let us know the framework. I am curious to know, as to how DOM parsing could be faster than MQ compute nodes. |
|
Back to top |
|
 |
surenat |
Posted: Wed Mar 17, 2004 7:35 am Post subject: |
|
|
Apprentice
Joined: 01 Jan 2002 Posts: 32
|
Hi Vallu:
I do not think DOM parser is faster than WMQI XML parser. The only difference I made in plugin was, I did not load incoming CSV message as MRM tree, instead I used java string tokenization to parser the CSV and then map the tokenized string data to DOM tree. _________________ IBM Certified Specialist MQSeries
IBM Certified Specialist - Websphere MQ Integrator |
|
Back to top |
|
 |
JLRowe |
Posted: Fri Mar 19, 2004 2:47 am Post subject: |
|
|
 Yatiri
Joined: 25 May 2002 Posts: 664 Location: South East London
|
I would warrant that the DOM parser is much faster than the WMQI one, especially for large messages.
There have been lots of posts in the past about WMQI performance problems with large messages. Part of the problem must be that the message tree is copied over for every node in the flow. The WMQI parser probably only has an advantage when you partially parse towards the head of the message and you do not update the message. |
|
Back to top |
|
 |
|