ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Handling large message

Post new topic  Reply to topic Goto page Previous  1, 2, 3  Next
 Handling large message « View previous topic :: View next topic » 
Author Message
sumit
PostPosted: Fri Apr 04, 2014 6:36 am    Post subject: Reply with quote

Partisan

Joined: 19 Jan 2006
Posts: 398

Thanks Kimbert.
I followed your suggestion and the flow was able to parse 500MB file . As you and Esa expected, I didn't need to increase the JVM size.
The flow processed a 500MB file in nearly 75 secs. Though, I am mapping just 1 field from input XML to output flat file. But with SELECT statement in the code, I am sure mapping few more fields will not increase the overall time drastically.

Esa, I'll check with the application team how they are creating the XML file. Will convey your suggestion if their process is same/similar as you mentioned.

I though have few queries regarding the previous approach w.r.t the memory consumption when I was talking the input message in Blob format, will rather post that on Monday, when I'll have the related data in hand.

Thanks again folks for help
_________________
Regards
Sumit
Back to top
View user's profile Send private message Yahoo Messenger
mattynorm
PostPosted: Fri Apr 04, 2014 7:03 am    Post subject: Reply with quote

Acolyte

Joined: 06 Jun 2003
Posts: 52

Rather than start a new thread, I was hoping I could get some advice on how to proceed with a slightly different issue wrt to large file processing

The file I have is large in terms of records (over 5m) but not so bad in terms of size (about 100mb - csv, only 3 fields). Have been trying to process it in what I would consider (you may not) a standard way, whereby I declare a reference to the first input.record, and WHILE LASTMOVE(inref) is valid, copy the input.record[inref] fields to output.record[outref], then RETURN TRUE when the LASTMOVE doesn't work. Guts of the code is :

Code:


         CREATE LASTCHILD OF Environment.Variables  DOMAIN 'DFDL' NAME 'Input';
         SET    Environment.Variables.Input.Inventory = InputRoot.DFDL.StockDB_Webstock_Stock ;
         MOVE inRef TO Environment.Variables.Input.Inventory.record[>] ;
         
      --Set up the headers
      CREATE FIRSTCHILD OF OutputRoot.DFDL.Inventory DOMAIN 'DFDL' NAME 'header_line_1';
         SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field1 = 'Inventory';
         SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field2 = '' ;
         SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field3 = '' ;
         SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field4 = '' ;
         SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field5 = '' ;
         SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field6 = '' ;
         SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field7 = '' ;
         SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field8 = '' ;
      
      CREATE LASTCHILD OF OutputRoot.DFDL.Inventory DOMAIN 'DFDL' NAME 'header_line_2';
         SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_currentStoreIdentifier = 'CurrentStoreIdentifier';
         SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_partNumber = 'PartNumber';
         SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_catEntryStoreIdentifier = 'CatEntryStoreIdentifier';
         SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_fulfillmentCenterId = 'FulfillmentCenterId';
         SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_fulfillmentCenterName = 'FulfillmentCenterName';
         SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_quantity = 'Quantity';
         SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_quantityUnit = 'QuantityUnit';
         SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_delete = 'Delete';

      -- No iterate through the input file records to create the output
      WHILE LASTMOVE(inRef) DO
         
         CREATE LASTCHILD OF OutputRoot.DFDL.Inventory DOMAIN 'DFDL' NAME 'record';
         MOVE outRef TO OutputRoot.DFDL.Inventory.record[<] ;
         
         SET outRef.currentStoreIdentifier = ''  ;
         SET outRef.partNumber = inRef.ArticleID  ;
         SET outRef.catEntryStoreIdentifier = ''  ;
         SET outRef.fulfillmentCenterId = inRef.SAPStoreID ;
         SET outRef.fulfillmentCenterName = ''  ;
         SET outRef.quantity = inRef.AvailableStock ;
         SET outRef.quantityUnit = 'C62';
         SET outRef.delete = ''  ;
         
         MOVE inRef NEXTSIBLING REPEAT TYPE NAME;
         DELETE PREVIOUSSIBLING OF inRef;
         
         
      END WHILE;


Seems to work fine when testing it with 25K input lines, but with the full 5m it returns nothing after a couple of hours.

Have tried to set the FileInput node to 'Parsed Record Sequence', however with a small file (header + 2 input lines) if I set it to 'Skip first Record', in Debug it looks like the first output from the FileInput node is the End Of File, if I leave 'Skip first Record' unchecked it appears to send all 3 records together. I'm parsing it on the way in against a DFDL schema, which seems to parse the (small) message fine when testing it in the DFDL Test harness.

Broker version is IB9.0.0.0.1, running on a Windows 7 VM (upped the RAM from 4gb to 8, didn't seem to make any difference)

Any clues as to what I'm doing wrong? Is there any real difference in this instance of declaring a ROW and moving the input file into that rather than moving it into Environment.Variables?

Also don't really understand why a flow with an input message of roughly 100mb, generating an output message of roughly 500mb would make the EG memory requirements go up to over 7gb?
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Fri Apr 04, 2014 7:14 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

you're pushing the problem from the input node to the output node.
What are you really trying to do?

How about writing the output message with PROPAGATE before the next iteration of the loop (and clearing it?)
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
mattynorm
PostPosted: Fri Apr 04, 2014 7:49 am    Post subject: Reply with quote

Acolyte

Joined: 06 Jun 2003
Posts: 52

I need to be able to create a single output file (with potentially 5m records in it ) to stick on the file system. I did consider splitting it out into individual messages, but that's just kicking the can down the road, as at some point I have to put them back together again
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Fri Apr 04, 2014 8:04 pm    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

mattynorm wrote:
I need to be able to create a single output file (with potentially 5m records in it ) to stick on the file system. I did consider splitting it out into individual messages, but that's just kicking the can down the road, as at some point I have to put them back together again

You put them together again by appending them one by one to the output file.
In the meantime you only have ever one record in flight and not the whole file! Your problem is that you are trying to keep all records in memory until the file is complete. Do not do that. Write the file one record at a time.
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
Esa
PostPosted: Sat Apr 05, 2014 1:25 am    Post subject: Reply with quote

Grand Master

Joined: 22 May 2008
Posts: 1387
Location: Finland

As you experienced, if you are planning to use large messaging techniques as in the sample, you shouldn't use 'parsed message sequence'. If you use that, your code should expect the input message to contain only one record. That's, of course, an alternative in your case, too.

But let's analyze your current approach now.

mattynorm wrote:

Also don't really understand why a flow with an input message of roughly 100mb, generating an output message of roughly 500mb would make the EG memory requirements go up to over 7gb?


A message of 500 Mb will occupy several times more when it's in memory as a parsed message tree. But it's shouldn't be that much.

Something goes wrong, obviously the input message allocates more memory than expected.

mattynorm wrote:

Code:


         CREATE LASTCHILD OF Environment.Variables  DOMAIN 'DFDL' NAME 'Input';
         SET    Environment.Variables.Input.Inventory = InputRoot.DFDL.StockDB_Webstock_Stock ;
         MOVE inRef TO Environment.Variables.Input.Inventory.record[>] ;


I strongly suspect that referring to Environment.Variables.Input.Inventory.record[>] forces the parser to parse the entire array of records -- in other words the whole input message.

To be absolutely sure that your code parses the message on demand, you should write it like this:

Code:


         CREATE LASTCHILD OF Environment.Variables  DOMAIN 'DFDL' NAME 'Input';
         SET    Environment.Variables.Input.Inventory = InputRoot.DFDL.StockDB_Webstock_Stock ;
         MOVE inRef TO Environment.Variables.Input.Inventory ;
        IF NOT LASTMOVE(inRef) THEN
            THROW USER EXCEPTION VALUES('some message');
        END IF;
        MOVE inRef FIRSTCHILD NAME 'record';


And what comes to the output message, follow fjb_sapers orders and propagate every record to a File Output node that is configured to append.
Back to top
View user's profile Send private message
mattynorm
PostPosted: Sat Apr 05, 2014 2:31 am    Post subject: Reply with quote

Acolyte

Joined: 06 Jun 2003
Posts: 52

Thank you to both of you, will apply those changes on Monday and see how it goes
Back to top
View user's profile Send private message
mattynorm
PostPosted: Tue Apr 08, 2014 2:02 am    Post subject: Reply with quote

Acolyte

Joined: 06 Jun 2003
Posts: 52

Once again, thanks for the suggestions, flow now works, but

a) takes roughly 42 minutes to complete

and

b) is still grabbing about 6.5gb of memory (according to TASKLIST)

Have tried setting debug - none for the flow (should have been off anyway), mqsireloading the EG and bouncing the broker, but it still spikes to that level. Also looked at the large message sample, and now place the input message into a ROW. Current code looks like this

Code:


      DECLARE inRef REFERENCE TO Environment.Variables;
      DECLARE outRef REFERENCE TO Environment.Variables;
      DECLARE firstHyphenPos INTEGER 0;
      DECLARE fileName CHAR ;
      DECLARE recordElementName CONSTANT CHAR 'record';
      
      

      --set up the outputfilename
      SET firstHyphenPos = POSITION('-' IN InputLocalEnvironment.File.Name) ;
      IF firstHyphenPos > 0 THEN
         SET fileName = 'Inventory' || SUBSTRING(InputLocalEnvironment.File.Name FROM firstHyphenPos);
      ELSE    
         SET fileName = 'Inventory.csv';
      END IF;      
      
      SET OutputLocalEnvironment.Destination.File.Name = fileName; 
 
     
         
      --Set up the headers
      CREATE FIRSTCHILD OF OutputRoot.DFDL.Inventory DOMAIN 'DFDL' NAME 'header_line_1';
         SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field1 = 'Inventory';
         SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field2 = '' ;
         SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field3 = '' ;
         SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field4 = '' ;
         SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field5 = '' ;
         SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field6 = '' ;
         SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field7 = '' ;
         SET OutputRoot.DFDL.Inventory.header_line_1.hdr1_field8 = '' ;
      
      CREATE LASTCHILD OF OutputRoot.DFDL.Inventory DOMAIN 'DFDL' NAME 'header_line_2';
         SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_currentStoreIdentifier = 'CurrentStoreIdentifier';
         SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_partNumber = 'PartNumber';
         SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_catEntryStoreIdentifier = 'CatEntryStoreIdentifier';
         SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_fulfillmentCenterId = 'FulfillmentCenterId';
         SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_fulfillmentCenterName = 'FulfillmentCenterName';
         SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_quantity = 'Quantity';
         SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_quantityUnit = 'QuantityUnit';
         SET OutputRoot.DFDL.Inventory.header_line_2.hdr2_delete = 'Delete';
         
      DECLARE rowCachedInputMsg ROW;
      
      CREATE FIRSTCHILD OF rowCachedInputMsg DOMAIN ('DFDL') NAME 'Input';
      SET rowCachedInputMsg.Input.Inventory = InputRoot.DFDL.StockDB_Webstock_Stock ;
      MOVE inRef TO rowCachedInputMsg.Input.Inventory ;
        IF NOT LASTMOVE(inRef) THEN
            THROW USER EXCEPTION VALUES('File Not Valid');
        END IF;
        MOVE inRef FIRSTCHILD NAME recordElementName ;                   

      -- No iterate through the input file records to create the output
      WHILE LASTMOVE(inRef) DO

         SET OutputLocalEnvironment.Destination.File.Name = fileName;
         
         CREATE LASTCHILD OF OutputRoot.DFDL.Inventory DOMAIN 'DFDL' NAME 'record';
         MOVE outRef TO OutputRoot.DFDL.Inventory.record[<] ;
         
         SET outRef.currentStoreIdentifier = ''  ;
         SET outRef.partNumber = inRef.ArticleID  ;
         SET outRef.catEntryStoreIdentifier = ''  ;
         SET outRef.fulfillmentCenterId = inRef.SAPStoreID ;
         SET outRef.fulfillmentCenterName = ''  ;
         SET outRef.quantity = inRef.AvailableStock ;
         SET outRef.quantityUnit = 'C62';
         SET outRef.delete = ''  ;
         
         MOVE inRef NEXTSIBLING REPEAT TYPE NAME;
         DELETE PREVIOUSSIBLING OF inRef;
         

         
         PROPAGATE TO TERMINAL 'out' ;
         
         
      END WHILE;

      --set up the outputfilename
      SET firstHyphenPos = POSITION('-' IN InputLocalEnvironment.File.Name) ;
      IF firstHyphenPos > 0 THEN
         SET OutputLocalEnvironment.Destination.File.Name = 'Inventory' ||
                                                SUBSTRING(InputLocalEnvironment.File.Name FROM firstHyphenPos);
      ELSE    
         SET OutputLocalEnvironment.Destination.File.Name = 'Inventory.csv';
      END IF;      

      PROPAGATE TO TERMINAL 'out1'; --end of file


      RETURN FALSE;



Any ideas on how to reduce the memory footprint? I have contemplated setting the FileInput to Parsed Record Sequence, but my understanding is that this will increase the processing time (which is already too high).
Back to top
View user's profile Send private message
Esa
PostPosted: Tue Apr 08, 2014 2:59 am    Post subject: Reply with quote

Grand Master

Joined: 22 May 2008
Posts: 1387
Location: Finland

mattynorm wrote:

Code:

         CREATE LASTCHILD OF OutputRoot.DFDL.Inventory DOMAIN 'DFDL' NAME 'record';
         MOVE outRef TO OutputRoot.DFDL.Inventory.record[<] ;


Creating another DFDL parser under an existing one may cause memory problems when you are doing it within a loop. I think PROPAGATE DELETE DEFAULT only releases the topmost parser.

Code:

        CREATE LASTCHILD OF OutputRoot.DFDL AS outRef NAME 'Inventory';
         SET outRef.record.currentStoreIdentifier = ''  ;
         MOVE outRef FIRSTCHILD NAME 'record';
         SET outRef.partNumber = inRef.ArticleID  ;
         SET outRef.catEntryStoreIdentifier = ''  ;
         SET outRef.fulfillmentCenterId = inRef.SAPStoreID ;
         SET outRef.fulfillmentCenterName = ''  ;
         SET outRef.quantity = inRef.AvailableStock ;
         SET outRef.quantityUnit = 'C62';
         SET outRef.delete = ''  ;


mattynorm wrote:

Code:
         
         MOVE inRef NEXTSIBLING REPEAT TYPE NAME;
         DELETE PREVIOUSSIBLING OF inRef;



You are moving the reference to the next instance with the same name and type?

If there are a lot of siblings with other names, you should make sure that you get rid of them, too. Otherwise the parsed input tree may still grow quite big.
Back to top
View user's profile Send private message
kimbert
PostPosted: Tue Apr 08, 2014 3:41 am    Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

Hi Matt,

From a brief inspection of the code, it looks as if you are creating a fully-populated OutputRoot.DFDL, and then writing the entire message in one operation. That would explain the ( very ) high memory usage.

You should
- cut down the input message to 5 records
- change the message flow so that each input record creates an OutputRoot.DFDL that contains exactly *one* output record.
- propagate this tiny, single-record message tree to an output terminal. It should be connected to a FileOutput node that operates in 'append' mode.

The first step is optional, but it will make it easier to debug the flow and confirm that it is working as designed. You could use a flow debugger to confirm that OutputRoot.DFDL is not growing with each iteration of the loop.
_________________
Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too.
Back to top
View user's profile Send private message
mattynorm
PostPosted: Tue Apr 08, 2014 4:34 am    Post subject: Reply with quote

Acolyte

Joined: 06 Jun 2003
Posts: 52

Thanks again

Quote:

You are moving the reference to the next instance with the same name and type?

If there are a lot of siblings with other names, you should make sure that you get rid of them, too. Otherwise the parsed input tree may still grow quite big.


I am doing that, but out of a force of habit rather than because I need to. So I will remove the REPEAT TYPE NAME bit.

Input File is of the format

Code:


Header
Record
Record
Record etc etc



Output File looks like

Code:

Header1
Header2
Record
Record
Record etc etc




Quote:

From a brief inspection of the code, it looks as if you are creating a fully-populated OutputRoot.DFDL, and then writing the entire message in one operation. That would explain the ( very ) high memory usage.


There is a Propagate within the WHILE loop, which spits out the Output File line by line (apart from the very first time, when it spits out the 2 headers + the first record)

Had a look at it in debug, and this does seem to be what it's doing.

Very confused.

As a further question, will the Message Broker (or Integration Node) be able to easily reclaim this memory.

I suspect this will run a bit better when it's no longer running on a VM and is on a server, but I'd still like to get the mem usage (and processing time) down if possible.
Back to top
View user's profile Send private message
Esa
PostPosted: Tue Apr 08, 2014 5:58 am    Post subject: Reply with quote

Grand Master

Joined: 22 May 2008
Posts: 1387
Location: Finland

mattynorm wrote:


Quote:

You are moving the reference to the next instance with the same name and type?

If there are a lot of siblings with other names, you should make sure that you get rid of them, too. Otherwise the parsed input tree may still grow quite big.


I am doing that, but out of a force of habit rather than because I need to. So I will remove the REPEAT TYPE NAME bit.

That's a good habit. You don't need to remove anything.
I was asking just to rule out a possible source of memory consumption.
mattynorm wrote:
Very confused.

Don't be. I guess kimbert briefly inspected your first version of the code, not the one you had modified to propagate each record separately.
mattynorm wrote:
As a further question, will the Message Broker (or Integration Node) be able to easily reclaim this memory.

No, it won't be able to reclaim any memory without administrative intervention.
mattynorm wrote:
I suspect this will run a bit better when it's no longer running on a VM and is on a server, but I'd still like to get the mem usage (and processing time) down if possible.


When using File Input and File Output nodes and large message processing techniques correctly, as you seem to be doing now, you should get the memory consumption down to hundreds or tens of megabytes or even less.

And there are no other nodes between the File Input node and the compute node?

Have you corrected the way you create unnecessary DFDL parsers in the middle of the message tree?
Back to top
View user's profile Send private message
mqjeff
PostPosted: Tue Apr 08, 2014 6:31 am    Post subject: Reply with quote

Grand Master

Joined: 25 Jun 2008
Posts: 17447

Esa wrote:
mattynorm wrote:
As a further question, will the Message Broker (or Integration Node) be able to easily reclaim this memory.

No, it won't be able to reclaim any memory without administrative intervention.


This is misleading.

Broker won't release memory back to the operating system without administrative action.

It will happily reclaim that memory and use it for other things.
Back to top
View user's profile Send private message
kimbert
PostPosted: Tue Apr 08, 2014 6:43 am    Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

Esa said:
Quote:
When using File Input and File Output nodes and large message processing techniques correctly, as you seem to be doing now, you should get the memory consumption down to hundreds or tens of megabytes or even less.
This is the key point. I just want to make clear that this *is* achievable, although it can be difficult in practice.
_________________
Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too.
Back to top
View user's profile Send private message
mattynorm
PostPosted: Tue Apr 08, 2014 6:57 am    Post subject: Reply with quote

Acolyte

Joined: 06 Jun 2003
Posts: 52

Thanks for the replies

Yes, it is just a FileInput -> Compute(esql) -> FileOutput flow, nothing complicated about it at all (there is a subflow hanging off the catch and a queue of the fail terminals, but they are not getting invoked)

Set Flow statistics in the Explorer, and the Total Elapsed Time was as follows

Quote:

FileInput - 2244
Compute - 1248156
FileOutput - 8622221


which I guess means the FileOutput is doing the lion's share of the work. The only properties I have changed from the defaults are setting the file mode to staging it in the mqsitransit dir, timestamp archive and replace an existing file, and from the Records and Elements setting it as 'Record is Delimited Data' (thought I would have to do this if propagating it out line by line), and I think this sets the delimiter as 'Broker System Line End' by default.

Any of those likely to have a significant performance impact?

Quote:

Have you corrected the way you create unnecessary DFDL parsers in the middle of the message tree?


I have, didn't seem to make a significant difference to the memory\processing time

Quote:

It will happily reclaim that memory and use it for other things.


So if a single EG has 5 gb of memory assigned, other EGs will be able to grab this if required (assuming it's not being used by the EG in question)?
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Goto page Previous  1, 2, 3  Next Page 2 of 3

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Handling large message
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.