ASG
IBM
Zystems
Cressida
Icon
Netflexity
 
  MQSeries.net
Search  Search       Tech Exchange      Education      Certifications      Library      Info Center      SupportPacs      LinkedIn  Search  Search                                                                   FAQ  FAQ   Usergroups  Usergroups
 
Register  ::  Log in Log in to check your private messages
 
RSS Feed - WebSphere MQ Support RSS Feed - Message Broker Support

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Handling large message

Post new topic  Reply to topic Goto page 1, 2, 3  Next
 Handling large message « View previous topic :: View next topic » 
Author Message
sumit
PostPosted: Wed Apr 02, 2014 4:58 am    Post subject: Handling large message Reply with quote

Partisan

Joined: 19 Jan 2006
Posts: 398

Hi All,

Configuration
OS - Linux and Windows
WMB v8.0.0.3

We are creating a flow to process large file (100MB+, can grow to 500MB+ as well) through WMB. I completely understand that it is not a good design to make broker process such a large file but it's a kind of setup we are creating to see if broker can (even though it shouldn't) process large file.

(I read somewhere that max file size that broker can process is 2GB.)

We have gone through infocenter, IBM documents and this forum to understand about the setup.

We have an input XML file with repeating tags that has to be transformed to an output flat file. At a high level, the flow has fileinput node, compute node (for transformation) and fileoutput node. We are following IBM's recommendation of reading data in BLOB, generate a tree without parsing, use LASTMOVE and NEXTSIBLING to travel through the tree and delete parsed node.

Also understand that we have to set MQSI_FILENODES_MAXIMUM_RECORD_LENGTH environment variable to overcome default 100MB limit of FileInput node. And may be, we need to allocate enough memory to JVM.

My query is -
I have got many other flows deployed in broker using FileInput node to process small files. If I set MQSI_FILENODES_MAXIMUM_RECORD_LENGTH to a high value (let's say to 600MB for handling a max 500MB file), will it allocate 600MB space for each flow OR will broker assign the space on need basis?
_________________
Regards
Sumit
Back to top
View user's profile Send private message Yahoo Messenger
ghoshly
PostPosted: Wed Apr 02, 2014 7:18 am    Post subject: Each record is exceeding 100MB ? Reply with quote

Partisan

Joined: 10 Jan 2008
Posts: 333

Does each record of your scenario is more than 100MB?

We are not talking about the complete file. Complete file can have many record.
Back to top
View user's profile Send private message
Vitor
PostPosted: Wed Apr 02, 2014 7:41 am    Post subject: Re: Handling large message Reply with quote

Grand High Poobah

Joined: 11 Nov 2005
Posts: 26093
Location: Texas, USA

sumit wrote:
(I read somewhere that max file size that broker can process is 2GB.)


I think you'll find that's a limit imposed by many Unix file systems, not WMB.

sumit wrote:
If I set MQSI_FILENODES_MAXIMUM_RECORD_LENGTH to a high value (let's say to 600MB for handling a max 500MB file), will it allocate 600MB space for each flow OR will broker assign the space on need basis?


WMB allocates space on an as-needed basis. So if the other flows are not processing the other, smaller files at the same time as this monster, their resource comsumption is irrelevant.

Was a 500Mb XML document really the best design choice you could make for this requirement? "Yes" is an acceptable answer here of course, but really?
_________________
Honesty is the best policy.
Insanity is the best defence.
Back to top
View user's profile Send private message
sumit
PostPosted: Wed Apr 02, 2014 10:32 am    Post subject: Reply with quote

Partisan

Joined: 19 Jan 2006
Posts: 398

ghoshly wrote:
Does each record of your scenario is more than 100MB?

We are not talking about the complete file. Complete file can have many record.


The whole file size is 100MB. And yes, we are reading complete file in one go and putting a lot of load on system memory..
_________________
Regards
Sumit
Back to top
View user's profile Send private message Yahoo Messenger
sumit
PostPosted: Wed Apr 02, 2014 10:47 am    Post subject: Re: Handling large message Reply with quote

Partisan

Joined: 19 Jan 2006
Posts: 398

Vitor wrote:
sumit wrote:
(I read somewhere that max file size that broker can process is 2GB.)


I think you'll find that's a limit imposed by many Unix file systems, not WMB.

Hmm.. and may be it varies from OS to OS.



Vitor wrote:

Was a 500Mb XML document really the best design choice you could make for this requirement? "Yes" is an acceptable answer here of course, but really?

The system will probably generate one file of size 500MB. The way it is informed to us makes it highly likely.
It's the overall inventory details flowing at the end of day from a Distribution Center(DC) to Inventory Management system and it has lots of data.
The limitation is more with the Inventory Management system for which data from one DC should be in 1 file. If we/DC split the file, Inventory Management system will over-write all the previous record with the records present in last file.

The other plan is to let DC generate multiple files and MB open a file for output in append mode and keep writing data in it.
_________________
Regards
Sumit
Back to top
View user's profile Send private message Yahoo Messenger
kimbert
PostPosted: Wed Apr 02, 2014 2:10 pm    Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

Quote:
We are following IBM's recommendation of reading data in BLOB, generate a tree without parsing, use LASTMOVE and NEXTSIBLING to travel through the tree and delete parsed node.
So you should not have any problems with memory. As far as memory consumption is concerned, the flow should perform just like the smaller flows. Do make sure that you write the output flat file by appending to the output file. If you try to assemble the entire output bit stream in the flow and then write it at the end, then you will simply move the memory problem to the output side of the message flow

Quote:
My query is -
I have got many other flows deployed in broker using FileInput node to process small files. If I set MQSI_FILENODES_MAXIMUM_RECORD_LENGTH to a high value (let's say to 600MB for handling a max 500MB file), will it allocate 600MB space for each flow OR will broker assign the space on need basis?
No, it will not. But I don't think you need to set that variable anyway. If you are using Parsed Record Sequence to read the input data then the FileInput node will only sip small chunks of the input file ( one record at a time).
_________________
Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too.
Back to top
View user's profile Send private message
Esa
PostPosted: Wed Apr 02, 2014 10:53 pm    Post subject: Reply with quote

Grand Master

Joined: 22 May 2008
Posts: 1387
Location: Finland

kimbert wrote:
If you are using Parsed Record Sequence to read the input data then the FileInput node will only sip small chunks of the input file ( one record at a time).

The OP's input file is an XML file with repeating elements. That makes using Parsed Record Sequence a bit more complicated, but not impossible.
Back to top
View user's profile Send private message
kimbert
PostPosted: Thu Apr 03, 2014 12:41 am    Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

That's true - Parsed Record Sequence is probably not the correct option here. Everything else that I said remains true. If Parse Timing is set to 'On demand' then the parser only reads as much data as it needs in order to satisfy the current parse operation.
_________________
Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too.
Back to top
View user's profile Send private message
sumit
PostPosted: Thu Apr 03, 2014 2:59 am    Post subject: Reply with quote

Partisan

Joined: 19 Jan 2006
Posts: 398

kimbert wrote:
Quote:
We are following IBM's recommendation of reading data in BLOB, generate a tree without parsing, use LASTMOVE and NEXTSIBLING to travel through the tree and delete parsed node.
So you should not have any problems with memory. As far as memory consumption is concerned, the flow should perform just like the smaller flows. Do make sure that you write the output flat file by appending to the output file.

Yes Kimbert, this is how we are doing. However, the flow/EG will still occupy memory of the same size as input file, if not more.

I am still testing the flow with files of various size like 100MB, 200MB, 300MB, 400MB and 500MB. With file size of 300MB, heap dumps are coming. I didn't increase the JVM size and was testing on default setup. Looks like I have to tweak it now.
_________________
Regards
Sumit
Back to top
View user's profile Send private message Yahoo Messenger
kimbert
PostPosted: Thu Apr 03, 2014 3:56 am    Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

Ah! I have just realized what your problem is.
Quote:
We are following IBM's recommendation of reading data in BLOB, generate a tree without parsing, use LASTMOVE and NEXTSIBLING to travel through the tree and delete parsed node.
The BLOB parser is not a streaming parser. It will read the entire input message into InputRoot.BLOB.BLOB. That explains why you are seeing high memory usage.

You need to
- set the domain to XMLNSC ( which *is* a streaming parser )
- make sure that Parse Timing is 'On Demand'
- construct your message flow so that it parses exactly one record at a time, appends the result to the output file and then discards the message tree. You probably have that logic in your current solution.

In order to allow discarding of the message tree your flow should copy InputRoot.XMLNSC to Environment.Variables.inputMessage ( and make sure that you create Environment.Variables.inputMessage using a CREATE statement with DOMAN 'XMLNSC'. Otherwise it will copy it to a domain-less tree and no parsing of the input will be possible ). It will simply copy a reference to the input data, and will not copy the entire tree or the entire input bitstream.
_________________
Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too.
Back to top
View user's profile Send private message
sumit
PostPosted: Thu Apr 03, 2014 4:12 am    Post subject: Reply with quote

Partisan

Joined: 19 Jan 2006
Posts: 398

Hmm.. Thanks for the advise Kimbert. I'll try and share the outcome.
_________________
Regards
Sumit
Back to top
View user's profile Send private message Yahoo Messenger
kimbert
PostPosted: Thu Apr 03, 2014 4:23 am    Post subject: Reply with quote

Jedi Council

Joined: 29 Jul 2003
Posts: 5542
Location: Southampton

One more point. If you had looked in the user trace you would have seen a warning. Whenever a parser copies its entire data content, the user trace will contain a BIP6065 warning telling you that you should have been using a streaming parser.
_________________
Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too.
Back to top
View user's profile Send private message
Esa
PostPosted: Thu Apr 03, 2014 10:55 pm    Post subject: Reply with quote

Grand Master

Joined: 22 May 2008
Posts: 1387
Location: Finland

That's a fantastic solution. I hope you manage to get it working, sumit.

The only drawback is that you will very likely have to adjust MQSI_FILENODES_MAXIMUM_RECORD_LENGTH. I doubt that the File Input node has means for detecting that you are using the streaming parser in a smart way downstream. But at least you don't need to adjust JVM memory.

An alternative solution:

This kind of huge files are often produced by a program or stored procedure that reads records from a database. Typically the program
1. Writes the XML root element start tag
2. Queries the database and appends an XML element for arch entry in the result set.
3. Appends the XML root element end tag

You can ask the developers to drop steps 1 and 3. Then you won't get one huge XML document with repeating elements but a file containing an array of XML documents.

File Input node's record detection type parsed record sequence is designed to work with an XML input exactly like that.
Back to top
View user's profile Send private message
fjb_saper
PostPosted: Fri Apr 04, 2014 4:56 am    Post subject: Reply with quote

Grand High Poobah

Joined: 18 Nov 2003
Posts: 20756
Location: LI,NY

@esa There is a lab about large files that works exactly as you described (1 root tag, array of line item tags) with up to 1005000 line item tags.

Use the large message sample and add a file input node. Set it to XMLNSC.
That's all you need to do. Works pretty slick. Ran through the file in less than 10 mins. (1GB size). So look at the sample and adapt!.

Have fun
_________________
MQ & Broker admin
Back to top
View user's profile Send private message Send e-mail
Esa
PostPosted: Fri Apr 04, 2014 5:25 am    Post subject: Reply with quote

Grand Master

Joined: 22 May 2008
Posts: 1387
Location: Finland

fjb_saper wrote:
@esa There is a lab about large files that works exactly as you described (1 root tag, array of line item tags) with up to 1005000 line item tags.


This suggests that File Input would actually ingore MQSI_FILENODES_MAXIMUM_RECORD_LENGTH setting if you are using a streaming parser with Parse Timing set to 'on demand'?

Sounds reasonable. I need to test it the way you suggested. When I have the time.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic  Reply to topic Goto page 1, 2, 3  Next Page 1 of 3

MQSeries.net Forum Index » WebSphere Message Broker (ACE) Support » Handling large message
Jump to:  



You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum
Protected by Anti-Spam ACP
 
 


Theme by Dustin Baccetti
Powered by phpBB © 2001, 2002 phpBB Group

Copyright © MQSeries.net. All rights reserved.