Author |
Message
|
sumit |
Posted: Fri Apr 04, 2014 6:36 am Post subject: |
|
|
Partisan
Joined: 19 Jan 2006 Posts: 398
|
Thanks Kimbert.
I followed your suggestion and the flow was able to parse 500MB file . As you and Esa expected, I didn't need to increase the JVM size.
The flow processed a 500MB file in nearly 75 secs. Though, I am mapping just 1 field from input XML to output flat file. But with SELECT statement in the code, I am sure mapping few more fields will not increase the overall time drastically.
Esa, I'll check with the application team how they are creating the XML file. Will convey your suggestion if their process is same/similar as you mentioned.
I though have few queries regarding the previous approach w.r.t the memory consumption when I was talking the input message in Blob format, will rather post that on Monday, when I'll have the related data in hand.
Thanks again folks for help  _________________ Regards
Sumit |
|
Back to top |
|
 |
mattynorm |
Posted: Fri Apr 04, 2014 7:03 am Post subject: |
|
|
Acolyte
Joined: 06 Jun 2003 Posts: 52
|
Rather than start a new thread, I was hoping I could get some advice on how to proceed with a slightly different issue wrt to large file processing
The file I have is large in terms of records (over 5m) but not so bad in terms of size (about 100mb - csv, only 3 fields). Have been trying to process it in what I would consider (you may not) a standard way, whereby I declare a reference to the first input.record, and WHILE LASTMOVE(inref) is valid, copy the input.record[inref] fields to output.record[outref], then RETURN TRUE when the LASTMOVE doesn't work. Guts of the code is :
Code: |
CREATE LASTCHILD OF Environment.Variables DOMAIN 'DFDL' NAME 'Input';
SET Environment.Variables.Input.Inventory = InputRoot.DFDL.StockDB_Webstock_Stock ;
MOVE inRef TO Environment.Variables.Input.Inventory.record[>] ;
--Set up the headers
CREATE FIRSTCHILD OF OutputRoot.DFDL.Inventory DOMAIN 'DFDL' NAME 'header_line_1';
SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field1 = 'Inventory';
SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field2 = '' ;
SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field3 = '' ;
SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field4 = '' ;
SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field5 = '' ;
SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field6 = '' ;
SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field7 = '' ;
SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field8 = '' ;
CREATE LASTCHILD OF OutputRoot.DFDL.Inventory DOMAIN 'DFDL' NAME 'header_line_2';
SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_currentStoreIdentifier = 'CurrentStoreIdentifier';
SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_partNumber = 'PartNumber';
SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_catEntryStoreIdentifier = 'CatEntryStoreIdentifier';
SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_fulfillmentCenterId = 'FulfillmentCenterId';
SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_fulfillmentCenterName = 'FulfillmentCenterName';
SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_quantity = 'Quantity';
SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_quantityUnit = 'QuantityUnit';
SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_delete = 'Delete';
-- No iterate through the input file records to create the output
WHILE LASTMOVE(inRef) DO
CREATE LASTCHILD OF OutputRoot.DFDL.Inventory DOMAIN 'DFDL' NAME 'record';
MOVE outRef TO OutputRoot.DFDL.Inventory.record[<] ;
SET outRef.currentStoreIdentifier = '' ;
SET outRef.partNumber = inRef.ArticleID ;
SET outRef.catEntryStoreIdentifier = '' ;
SET outRef.fulfillmentCenterId = inRef.SAPStoreID ;
SET outRef.fulfillmentCenterName = '' ;
SET outRef.quantity = inRef.AvailableStock ;
SET outRef.quantityUnit = 'C62';
SET outRef.delete = '' ;
MOVE inRef NEXTSIBLING REPEAT TYPE NAME;
DELETE PREVIOUSSIBLING OF inRef;
END WHILE;
|
Seems to work fine when testing it with 25K input lines, but with the full 5m it returns nothing after a couple of hours.
Have tried to set the FileInput node to 'Parsed Record Sequence', however with a small file (header + 2 input lines) if I set it to 'Skip first Record', in Debug it looks like the first output from the FileInput node is the End Of File, if I leave 'Skip first Record' unchecked it appears to send all 3 records together. I'm parsing it on the way in against a DFDL schema, which seems to parse the (small) message fine when testing it in the DFDL Test harness.
Broker version is IB9.0.0.0.1, running on a Windows 7 VM (upped the RAM from 4gb to 8, didn't seem to make any difference)
Any clues as to what I'm doing wrong? Is there any real difference in this instance of declaring a ROW and moving the input file into that rather than moving it into Environment.Variables?
Also don't really understand why a flow with an input message of roughly 100mb, generating an output message of roughly 500mb would make the EG memory requirements go up to over 7gb? |
|
Back to top |
|
 |
fjb_saper |
Posted: Fri Apr 04, 2014 7:14 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
you're pushing the problem from the input node to the output node.
What are you really trying to do?
How about writing the output message with PROPAGATE before the next iteration of the loop (and clearing it?)  _________________ MQ & Broker admin |
|
Back to top |
|
 |
mattynorm |
Posted: Fri Apr 04, 2014 7:49 am Post subject: |
|
|
Acolyte
Joined: 06 Jun 2003 Posts: 52
|
I need to be able to create a single output file (with potentially 5m records in it ) to stick on the file system. I did consider splitting it out into individual messages, but that's just kicking the can down the road, as at some point I have to put them back together again |
|
Back to top |
|
 |
fjb_saper |
Posted: Fri Apr 04, 2014 8:04 pm Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
mattynorm wrote: |
I need to be able to create a single output file (with potentially 5m records in it ) to stick on the file system. I did consider splitting it out into individual messages, but that's just kicking the can down the road, as at some point I have to put them back together again |
You put them together again by appending them one by one to the output file.
In the meantime you only have ever one record in flight and not the whole file! Your problem is that you are trying to keep all records in memory until the file is complete. Do not do that. Write the file one record at a time.  _________________ MQ & Broker admin |
|
Back to top |
|
 |
Esa |
Posted: Sat Apr 05, 2014 1:25 am Post subject: |
|
|
 Grand Master
Joined: 22 May 2008 Posts: 1387 Location: Finland
|
As you experienced, if you are planning to use large messaging techniques as in the sample, you shouldn't use 'parsed message sequence'. If you use that, your code should expect the input message to contain only one record. That's, of course, an alternative in your case, too.
But let's analyze your current approach now.
mattynorm wrote: |
Also don't really understand why a flow with an input message of roughly 100mb, generating an output message of roughly 500mb would make the EG memory requirements go up to over 7gb? |
A message of 500 Mb will occupy several times more when it's in memory as a parsed message tree. But it's shouldn't be that much.
Something goes wrong, obviously the input message allocates more memory than expected.
mattynorm wrote: |
Code: |
CREATE LASTCHILD OF Environment.Variables DOMAIN 'DFDL' NAME 'Input';
SET Environment.Variables.Input.Inventory = InputRoot.DFDL.StockDB_Webstock_Stock ;
MOVE inRef TO Environment.Variables.Input.Inventory.record[>] ;
|
|
I strongly suspect that referring to Environment.Variables.Input.Inventory.record[>] forces the parser to parse the entire array of records -- in other words the whole input message.
To be absolutely sure that your code parses the message on demand, you should write it like this:
Code: |
CREATE LASTCHILD OF Environment.Variables DOMAIN 'DFDL' NAME 'Input';
SET Environment.Variables.Input.Inventory = InputRoot.DFDL.StockDB_Webstock_Stock ;
MOVE inRef TO Environment.Variables.Input.Inventory ;
IF NOT LASTMOVE(inRef) THEN
THROW USER EXCEPTION VALUES('some message');
END IF;
MOVE inRef FIRSTCHILD NAME 'record';
|
And what comes to the output message, follow fjb_sapers orders and propagate every record to a File Output node that is configured to append. |
|
Back to top |
|
 |
mattynorm |
Posted: Sat Apr 05, 2014 2:31 am Post subject: |
|
|
Acolyte
Joined: 06 Jun 2003 Posts: 52
|
Thank you to both of you, will apply those changes on Monday and see how it goes |
|
Back to top |
|
 |
mattynorm |
Posted: Tue Apr 08, 2014 2:02 am Post subject: |
|
|
Acolyte
Joined: 06 Jun 2003 Posts: 52
|
Once again, thanks for the suggestions, flow now works, but
a) takes roughly 42 minutes to complete
and
b) is still grabbing about 6.5gb of memory (according to TASKLIST)
Have tried setting debug - none for the flow (should have been off anyway), mqsireloading the EG and bouncing the broker, but it still spikes to that level. Also looked at the large message sample, and now place the input message into a ROW. Current code looks like this
Code: |
DECLARE inRef REFERENCE TO Environment.Variables;
DECLARE outRef REFERENCE TO Environment.Variables;
DECLARE firstHyphenPos INTEGER 0;
DECLARE fileName CHAR ;
DECLARE recordElementName CONSTANT CHAR 'record';
--set up the outputfilename
SET firstHyphenPos = POSITION('-' IN InputLocalEnvironment.File.Name) ;
IF firstHyphenPos > 0 THEN
SET fileName = 'Inventory' || SUBSTRING(InputLocalEnvironment.File.Name FROM firstHyphenPos);
ELSE
SET fileName = 'Inventory.csv';
END IF;
SET OutputLocalEnvironment.Destination.File.Name = fileName;
--Set up the headers
CREATE FIRSTCHILD OF OutputRoot.DFDL.Inventory DOMAIN 'DFDL' NAME 'header_line_1';
SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field1 = 'Inventory';
SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field2 = '' ;
SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field3 = '' ;
SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field4 = '' ;
SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field5 = '' ;
SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field6 = '' ;
SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field7 = '' ;
SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field8 = '' ;
CREATE LASTCHILD OF OutputRoot.DFDL.Inventory DOMAIN 'DFDL' NAME 'header_line_2';
SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_currentStoreIdentifier = 'CurrentStoreIdentifier';
SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_partNumber = 'PartNumber';
SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_catEntryStoreIdentifier = 'CatEntryStoreIdentifier';
SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_fulfillmentCenterId = 'FulfillmentCenterId';
SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_fulfillmentCenterName = 'FulfillmentCenterName';
SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_quantity = 'Quantity';
SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_quantityUnit = 'QuantityUnit';
SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_delete = 'Delete';
DECLARE rowCachedInputMsg ROW;
CREATE FIRSTCHILD OF rowCachedInputMsg DOMAIN ('DFDL') NAME 'Input';
SET rowCachedInputMsg.Input.Inventory = InputRoot.DFDL.StockDB_Webstock_Stock ;
MOVE inRef TO rowCachedInputMsg.Input.Inventory ;
IF NOT LASTMOVE(inRef) THEN
THROW USER EXCEPTION VALUES('File Not Valid');
END IF;
MOVE inRef FIRSTCHILD NAME recordElementName ;
-- No iterate through the input file records to create the output
WHILE LASTMOVE(inRef) DO
SET OutputLocalEnvironment.Destination.File.Name = fileName;
CREATE LASTCHILD OF OutputRoot.DFDL.Inventory DOMAIN 'DFDL' NAME 'record';
MOVE outRef TO OutputRoot.DFDL.Inventory.record[<] ;
SET outRef.currentStoreIdentifier = '' ;
SET outRef.partNumber = inRef.ArticleID ;
SET outRef.catEntryStoreIdentifier = '' ;
SET outRef.fulfillmentCenterId = inRef.SAPStoreID ;
SET outRef.fulfillmentCenterName = '' ;
SET outRef.quantity = inRef.AvailableStock ;
SET outRef.quantityUnit = 'C62';
SET outRef.delete = '' ;
MOVE inRef NEXTSIBLING REPEAT TYPE NAME;
DELETE PREVIOUSSIBLING OF inRef;
PROPAGATE TO TERMINAL 'out' ;
END WHILE;
--set up the outputfilename
SET firstHyphenPos = POSITION('-' IN InputLocalEnvironment.File.Name) ;
IF firstHyphenPos > 0 THEN
SET OutputLocalEnvironment.Destination.File.Name = 'Inventory' ||
SUBSTRING(InputLocalEnvironment.File.Name FROM firstHyphenPos);
ELSE
SET OutputLocalEnvironment.Destination.File.Name = 'Inventory.csv';
END IF;
PROPAGATE TO TERMINAL 'out1'; --end of file
RETURN FALSE;
|
Any ideas on how to reduce the memory footprint? I have contemplated setting the FileInput to Parsed Record Sequence, but my understanding is that this will increase the processing time (which is already too high). |
|
Back to top |
|
 |
Esa |
Posted: Tue Apr 08, 2014 2:59 am Post subject: |
|
|
 Grand Master
Joined: 22 May 2008 Posts: 1387 Location: Finland
|
mattynorm wrote: |
Code: |
CREATE LASTCHILD OF OutputRoot.DFDL.Inventory DOMAIN 'DFDL' NAME 'record';
MOVE outRef TO OutputRoot.DFDL.Inventory.record[<] ; |
|
Creating another DFDL parser under an existing one may cause memory problems when you are doing it within a loop. I think PROPAGATE DELETE DEFAULT only releases the topmost parser.
Code: |
CREATE LASTCHILD OF OutputRoot.DFDL AS outRef NAME 'Inventory';
SET outRef.record.currentStoreIdentifier = '' ;
MOVE outRef FIRSTCHILD NAME 'record';
SET outRef.partNumber = inRef.ArticleID ;
SET outRef.catEntryStoreIdentifier = '' ;
SET outRef.fulfillmentCenterId = inRef.SAPStoreID ;
SET outRef.fulfillmentCenterName = '' ;
SET outRef.quantity = inRef.AvailableStock ;
SET outRef.quantityUnit = 'C62';
SET outRef.delete = '' ;
|
mattynorm wrote: |
Code: |
MOVE inRef NEXTSIBLING REPEAT TYPE NAME;
DELETE PREVIOUSSIBLING OF inRef;
|
|
You are moving the reference to the next instance with the same name and type?
If there are a lot of siblings with other names, you should make sure that you get rid of them, too. Otherwise the parsed input tree may still grow quite big. |
|
Back to top |
|
 |
kimbert |
Posted: Tue Apr 08, 2014 3:41 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Hi Matt,
From a brief inspection of the code, it looks as if you are creating a fully-populated OutputRoot.DFDL, and then writing the entire message in one operation. That would explain the ( very ) high memory usage.
You should
- cut down the input message to 5 records
- change the message flow so that each input record creates an OutputRoot.DFDL that contains exactly *one* output record.
- propagate this tiny, single-record message tree to an output terminal. It should be connected to a FileOutput node that operates in 'append' mode.
The first step is optional, but it will make it easier to debug the flow and confirm that it is working as designed. You could use a flow debugger to confirm that OutputRoot.DFDL is not growing with each iteration of the loop. _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
mattynorm |
Posted: Tue Apr 08, 2014 4:34 am Post subject: |
|
|
Acolyte
Joined: 06 Jun 2003 Posts: 52
|
Thanks again
Quote: |
You are moving the reference to the next instance with the same name and type?
If there are a lot of siblings with other names, you should make sure that you get rid of them, too. Otherwise the parsed input tree may still grow quite big.
|
I am doing that, but out of a force of habit rather than because I need to. So I will remove the REPEAT TYPE NAME bit.
Input File is of the format
Code: |
Header
Record
Record
Record etc etc
|
Output File looks like
Code: |
Header1
Header2
Record
Record
Record etc etc
|
Quote: |
From a brief inspection of the code, it looks as if you are creating a fully-populated OutputRoot.DFDL, and then writing the entire message in one operation. That would explain the ( very ) high memory usage.
|
There is a Propagate within the WHILE loop, which spits out the Output File line by line (apart from the very first time, when it spits out the 2 headers + the first record)
Had a look at it in debug, and this does seem to be what it's doing.
Very confused.
As a further question, will the Message Broker (or Integration Node) be able to easily reclaim this memory.
I suspect this will run a bit better when it's no longer running on a VM and is on a server, but I'd still like to get the mem usage (and processing time) down if possible. |
|
Back to top |
|
 |
Esa |
Posted: Tue Apr 08, 2014 5:58 am Post subject: |
|
|
 Grand Master
Joined: 22 May 2008 Posts: 1387 Location: Finland
|
mattynorm wrote: |
Quote: |
You are moving the reference to the next instance with the same name and type?
If there are a lot of siblings with other names, you should make sure that you get rid of them, too. Otherwise the parsed input tree may still grow quite big.
|
I am doing that, but out of a force of habit rather than because I need to. So I will remove the REPEAT TYPE NAME bit. |
That's a good habit. You don't need to remove anything.
I was asking just to rule out a possible source of memory consumption.
mattynorm wrote: |
Very confused. |
Don't be. I guess kimbert briefly inspected your first version of the code, not the one you had modified to propagate each record separately.
mattynorm wrote: |
As a further question, will the Message Broker (or Integration Node) be able to easily reclaim this memory. |
No, it won't be able to reclaim any memory without administrative intervention.
mattynorm wrote: |
I suspect this will run a bit better when it's no longer running on a VM and is on a server, but I'd still like to get the mem usage (and processing time) down if possible. |
When using File Input and File Output nodes and large message processing techniques correctly, as you seem to be doing now, you should get the memory consumption down to hundreds or tens of megabytes or even less.
And there are no other nodes between the File Input node and the compute node?
Have you corrected the way you create unnecessary DFDL parsers in the middle of the message tree? |
|
Back to top |
|
 |
mqjeff |
Posted: Tue Apr 08, 2014 6:31 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Esa wrote: |
mattynorm wrote: |
As a further question, will the Message Broker (or Integration Node) be able to easily reclaim this memory. |
No, it won't be able to reclaim any memory without administrative intervention. |
This is misleading.
Broker won't release memory back to the operating system without administrative action.
It will happily reclaim that memory and use it for other things. |
|
Back to top |
|
 |
kimbert |
Posted: Tue Apr 08, 2014 6:43 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Esa said:
Quote: |
When using File Input and File Output nodes and large message processing techniques correctly, as you seem to be doing now, you should get the memory consumption down to hundreds or tens of megabytes or even less. |
This is the key point. I just want to make clear that this *is* achievable, although it can be difficult in practice. _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
mattynorm |
Posted: Tue Apr 08, 2014 6:57 am Post subject: |
|
|
Acolyte
Joined: 06 Jun 2003 Posts: 52
|
Thanks for the replies
Yes, it is just a FileInput -> Compute(esql) -> FileOutput flow, nothing complicated about it at all (there is a subflow hanging off the catch and a queue of the fail terminals, but they are not getting invoked)
Set Flow statistics in the Explorer, and the Total Elapsed Time was as follows
Quote: |
FileInput - 2244
Compute - 1248156
FileOutput - 8622221
|
which I guess means the FileOutput is doing the lion's share of the work. The only properties I have changed from the defaults are setting the file mode to staging it in the mqsitransit dir, timestamp archive and replace an existing file, and from the Records and Elements setting it as 'Record is Delimited Data' (thought I would have to do this if propagating it out line by line), and I think this sets the delimiter as 'Broker System Line End' by default.
Any of those likely to have a significant performance impact?
Quote: |
Have you corrected the way you create unnecessary DFDL parsers in the middle of the message tree?
|
I have, didn't seem to make a significant difference to the memory\processing time
Quote: |
It will happily reclaim that memory and use it for other things.
|
So if a single EG has 5 gb of memory assigned, other EGs will be able to grab this if required (assuming it's not being used by the EG in question)? |
|
Back to top |
|
 |
|