Author |
Message
|
sumit |
Posted: Wed Apr 02, 2014 4:58 am Post subject: Handling large message |
|
|
Partisan
Joined: 19 Jan 2006 Posts: 398
|
Hi All,
Configuration
OS - Linux and Windows
WMB v8.0.0.3
We are creating a flow to process large file (100MB+, can grow to 500MB+ as well) through WMB. I completely understand that it is not a good design to make broker process such a large file but it's a kind of setup we are creating to see if broker can (even though it shouldn't) process large file.
(I read somewhere that max file size that broker can process is 2GB.)
We have gone through infocenter, IBM documents and this forum to understand about the setup.
We have an input XML file with repeating tags that has to be transformed to an output flat file. At a high level, the flow has fileinput node, compute node (for transformation) and fileoutput node. We are following IBM's recommendation of reading data in BLOB, generate a tree without parsing, use LASTMOVE and NEXTSIBLING to travel through the tree and delete parsed node.
Also understand that we have to set MQSI_FILENODES_MAXIMUM_RECORD_LENGTH environment variable to overcome default 100MB limit of FileInput node. And may be, we need to allocate enough memory to JVM.
My query is -
I have got many other flows deployed in broker using FileInput node to process small files. If I set MQSI_FILENODES_MAXIMUM_RECORD_LENGTH to a high value (let's say to 600MB for handling a max 500MB file), will it allocate 600MB space for each flow OR will broker assign the space on need basis? _________________ Regards
Sumit |
|
Back to top |
|
 |
ghoshly |
Posted: Wed Apr 02, 2014 7:18 am Post subject: Each record is exceeding 100MB ? |
|
|
Partisan
Joined: 10 Jan 2008 Posts: 333
|
Does each record of your scenario is more than 100MB?
We are not talking about the complete file. Complete file can have many record. |
|
Back to top |
|
 |
Vitor |
Posted: Wed Apr 02, 2014 7:41 am Post subject: Re: Handling large message |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
sumit wrote: |
(I read somewhere that max file size that broker can process is 2GB.) |
I think you'll find that's a limit imposed by many Unix file systems, not WMB.
sumit wrote: |
If I set MQSI_FILENODES_MAXIMUM_RECORD_LENGTH to a high value (let's say to 600MB for handling a max 500MB file), will it allocate 600MB space for each flow OR will broker assign the space on need basis? |
WMB allocates space on an as-needed basis. So if the other flows are not processing the other, smaller files at the same time as this monster, their resource comsumption is irrelevant.
Was a 500Mb XML document really the best design choice you could make for this requirement? "Yes" is an acceptable answer here of course, but really? _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
sumit |
Posted: Wed Apr 02, 2014 10:32 am Post subject: |
|
|
Partisan
Joined: 19 Jan 2006 Posts: 398
|
ghoshly wrote: |
Does each record of your scenario is more than 100MB?
We are not talking about the complete file. Complete file can have many record. |
The whole file size is 100MB. And yes, we are reading complete file in one go and putting a lot of load on system memory..  _________________ Regards
Sumit |
|
Back to top |
|
 |
sumit |
Posted: Wed Apr 02, 2014 10:47 am Post subject: Re: Handling large message |
|
|
Partisan
Joined: 19 Jan 2006 Posts: 398
|
Vitor wrote: |
sumit wrote: |
(I read somewhere that max file size that broker can process is 2GB.) |
I think you'll find that's a limit imposed by many Unix file systems, not WMB. |
Hmm.. and may be it varies from OS to OS.
Vitor wrote: |
Was a 500Mb XML document really the best design choice you could make for this requirement? "Yes" is an acceptable answer here of course, but really? |
The system will probably generate one file of size 500MB. The way it is informed to us makes it highly likely.
It's the overall inventory details flowing at the end of day from a Distribution Center(DC) to Inventory Management system and it has lots of data.
The limitation is more with the Inventory Management system for which data from one DC should be in 1 file. If we/DC split the file, Inventory Management system will over-write all the previous record with the records present in last file.
The other plan is to let DC generate multiple files and MB open a file for output in append mode and keep writing data in it. _________________ Regards
Sumit |
|
Back to top |
|
 |
kimbert |
Posted: Wed Apr 02, 2014 2:10 pm Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
We are following IBM's recommendation of reading data in BLOB, generate a tree without parsing, use LASTMOVE and NEXTSIBLING to travel through the tree and delete parsed node. |
So you should not have any problems with memory. As far as memory consumption is concerned, the flow should perform just like the smaller flows. Do make sure that you write the output flat file by appending to the output file. If you try to assemble the entire output bit stream in the flow and then write it at the end, then you will simply move the memory problem to the output side of the message flow
Quote: |
My query is -
I have got many other flows deployed in broker using FileInput node to process small files. If I set MQSI_FILENODES_MAXIMUM_RECORD_LENGTH to a high value (let's say to 600MB for handling a max 500MB file), will it allocate 600MB space for each flow OR will broker assign the space on need basis? |
No, it will not. But I don't think you need to set that variable anyway. If you are using Parsed Record Sequence to read the input data then the FileInput node will only sip small chunks of the input file ( one record at a time). _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
Esa |
Posted: Wed Apr 02, 2014 10:53 pm Post subject: |
|
|
 Grand Master
Joined: 22 May 2008 Posts: 1387 Location: Finland
|
kimbert wrote: |
If you are using Parsed Record Sequence to read the input data then the FileInput node will only sip small chunks of the input file ( one record at a time). |
The OP's input file is an XML file with repeating elements. That makes using Parsed Record Sequence a bit more complicated, but not impossible. |
|
Back to top |
|
 |
kimbert |
Posted: Thu Apr 03, 2014 12:41 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
That's true - Parsed Record Sequence is probably not the correct option here. Everything else that I said remains true. If Parse Timing is set to 'On demand' then the parser only reads as much data as it needs in order to satisfy the current parse operation. _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
sumit |
Posted: Thu Apr 03, 2014 2:59 am Post subject: |
|
|
Partisan
Joined: 19 Jan 2006 Posts: 398
|
kimbert wrote: |
Quote: |
We are following IBM's recommendation of reading data in BLOB, generate a tree without parsing, use LASTMOVE and NEXTSIBLING to travel through the tree and delete parsed node. |
So you should not have any problems with memory. As far as memory consumption is concerned, the flow should perform just like the smaller flows. Do make sure that you write the output flat file by appending to the output file.
|
Yes Kimbert, this is how we are doing. However, the flow/EG will still occupy memory of the same size as input file, if not more.
I am still testing the flow with files of various size like 100MB, 200MB, 300MB, 400MB and 500MB. With file size of 300MB, heap dumps are coming. I didn't increase the JVM size and was testing on default setup. Looks like I have to tweak it now. _________________ Regards
Sumit |
|
Back to top |
|
 |
kimbert |
Posted: Thu Apr 03, 2014 3:56 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Ah! I have just realized what your problem is.
Quote: |
We are following IBM's recommendation of reading data in BLOB, generate a tree without parsing, use LASTMOVE and NEXTSIBLING to travel through the tree and delete parsed node. |
The BLOB parser is not a streaming parser. It will read the entire input message into InputRoot.BLOB.BLOB. That explains why you are seeing high memory usage.
You need to
- set the domain to XMLNSC ( which *is* a streaming parser )
- make sure that Parse Timing is 'On Demand'
- construct your message flow so that it parses exactly one record at a time, appends the result to the output file and then discards the message tree. You probably have that logic in your current solution.
In order to allow discarding of the message tree your flow should copy InputRoot.XMLNSC to Environment.Variables.inputMessage ( and make sure that you create Environment.Variables.inputMessage using a CREATE statement with DOMAN 'XMLNSC'. Otherwise it will copy it to a domain-less tree and no parsing of the input will be possible ). It will simply copy a reference to the input data, and will not copy the entire tree or the entire input bitstream. _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
sumit |
Posted: Thu Apr 03, 2014 4:12 am Post subject: |
|
|
Partisan
Joined: 19 Jan 2006 Posts: 398
|
Hmm.. Thanks for the advise Kimbert. I'll try and share the outcome. _________________ Regards
Sumit |
|
Back to top |
|
 |
kimbert |
Posted: Thu Apr 03, 2014 4:23 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
One more point. If you had looked in the user trace you would have seen a warning. Whenever a parser copies its entire data content, the user trace will contain a BIP6065 warning telling you that you should have been using a streaming parser. _________________ Before you criticize someone, walk a mile in their shoes. That way you're a mile away, and you have their shoes too. |
|
Back to top |
|
 |
Esa |
Posted: Thu Apr 03, 2014 10:55 pm Post subject: |
|
|
 Grand Master
Joined: 22 May 2008 Posts: 1387 Location: Finland
|
That's a fantastic solution. I hope you manage to get it working, sumit.
The only drawback is that you will very likely have to adjust MQSI_FILENODES_MAXIMUM_RECORD_LENGTH. I doubt that the File Input node has means for detecting that you are using the streaming parser in a smart way downstream. But at least you don't need to adjust JVM memory.
An alternative solution:
This kind of huge files are often produced by a program or stored procedure that reads records from a database. Typically the program
1. Writes the XML root element start tag
2. Queries the database and appends an XML element for arch entry in the result set.
3. Appends the XML root element end tag
You can ask the developers to drop steps 1 and 3. Then you won't get one huge XML document with repeating elements but a file containing an array of XML documents.
File Input node's record detection type parsed record sequence is designed to work with an XML input exactly like that. |
|
Back to top |
|
 |
fjb_saper |
Posted: Fri Apr 04, 2014 4:56 am Post subject: |
|
|
 Grand High Poobah
Joined: 18 Nov 2003 Posts: 20756 Location: LI,NY
|
@esa There is a lab about large files that works exactly as you described (1 root tag, array of line item tags) with up to 1005000 line item tags.
Use the large message sample and add a file input node. Set it to XMLNSC.
That's all you need to do. Works pretty slick. Ran through the file in less than 10 mins. (1GB size). So look at the sample and adapt!.
Have fun  _________________ MQ & Broker admin |
|
Back to top |
|
 |
Esa |
Posted: Fri Apr 04, 2014 5:25 am Post subject: |
|
|
 Grand Master
Joined: 22 May 2008 Posts: 1387 Location: Finland
|
fjb_saper wrote: |
@esa There is a lab about large files that works exactly as you described (1 root tag, array of line item tags) with up to 1005000 line item tags.
|
This suggests that File Input would actually ingore MQSI_FILENODES_MAXIMUM_RECORD_LENGTH setting if you are using a streaming parser with Parse Timing set to 'on demand'?
Sounds reasonable. I need to test it the way you suggested. When I have the time. |
|
Back to top |
|
 |
|