Author |
Message
|
jrsdm |
Posted: Wed Oct 31, 2012 3:44 am Post subject: |
|
|
Apprentice
Joined: 24 Oct 2012 Posts: 27
|
As the files in GB's but we are processing the KB's size of record in the flow each time , So do the memroy consumption is more ?
1)Do I need to increase the execution group memory ?
2) Also my flow would be like this, As i need to create the 2 files of different messge set from the record which transfer from main flow.
Main Flow->(3 GB FILE-reading->Fileinputnode------1 record------ Mqoutout1 or mqoutput2
Depending upon the message content corresponding flow will trigger
2nd flow -> Mqinput1-->Compute--->Fileoutnode(flile created and SFTP)
3rd flow->Mqinput2-->Compute--->Fileoutnode(flile created and SFTP)
Any suggestion on the same.
Thanks |
|
Back to top |
|
 |
Vitor |
Posted: Wed Oct 31, 2012 5:26 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
jrsdm wrote: |
As the files in GB's but we are processing the KB's size of record in the flow each time , So do the memroy consumption is more ? |
Have you even looked at that sample? _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
jrsdm |
Posted: Wed Oct 31, 2012 12:09 pm Post subject: |
|
|
Apprentice
Joined: 24 Oct 2012 Posts: 27
|
I have looked at that sample and in that sample the file is read as whole file and after compute node it will pass the record one by one.
But i dont want to do that ,I need to read from the file one at a time by using Parsed record sequence property of the file input node .
so by following this approch the message flows in the previous message is fine , just want suggestion
Thanks |
|
Back to top |
|
 |
Vitor |
Posted: Wed Oct 31, 2012 1:00 pm Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
jrsdm wrote: |
I have looked at that sample and in that sample the file is read as whole file and after compute node it will pass the record one by one. |
Does it? Does it really read the whole file? Or does it read enough of the file to generate the propogated record?
jrsdm wrote: |
But i dont want to do that ,I need to read from the file one at a time by using Parsed record sequence property of the file input node . |
Parsed record sequence is a very, very resource intensive way of parsing a file and is the method of last resort when absolutely everything else has failed. Reading a 3Gb file with that method is going to take a while and cost a fortune in CPU and memory.
jrsdm wrote: |
so by following this approch the message flows in the previous message is fine |
You clearly know best so go in peace with your solution. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
kimbert |
Posted: Wed Oct 31, 2012 1:07 pm Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
Quote: |
I have looked at that sample and in that sample the file is read as whole file and after compute node it will pass the record one by one.
|
Question for jrsdm: If the setting is 'Whole file' does that mean that the entire multi-Gb file will be read into a single multi-Gb memory buffer? |
|
Back to top |
|
 |
jrsdm |
Posted: Wed Oct 31, 2012 3:57 pm Post subject: |
|
|
Apprentice
Joined: 24 Oct 2012 Posts: 27
|
Jedi -> I have checked the info center but not got any idea that if We select the whole file it will read in the buffer at one shot.
What your's say on the same ? |
|
Back to top |
|
 |
kimbert |
Posted: Wed Oct 31, 2012 4:08 pm Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
I looked in the info center, but I couldn't find any positive statement about this. But I do know the answer....
WMB is actually pretty smart about memory usage when reading large files ( or any other large input stream ). It does not read the entire input message into memory. Instead, it streams the input into the flow in smaller chunks as and when it is required. This keeps peak memory usage down when reading large inputs.
So if you use 'Whole File' with the technique demonstrated in the sample then you should see pretty good memory usage characteristics.
btw, Vitor is correct about Parsed Record Sequence. It's useful if you need it, but I don't think you need it. |
|
Back to top |
|
 |
jrsdm |
Posted: Thu Nov 01, 2012 2:36 am Post subject: |
|
|
Apprentice
Joined: 24 Oct 2012 Posts: 27
|
|
Back to top |
|
 |
mqjeff |
Posted: Thu Nov 01, 2012 3:14 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
I'd say you should prove it yourself.
And make sure you're at the most recent FP of at least broker v7, if not v8.
If you're at v6.0, you should accept that you've shot yourself in the foot already and you'd rather not shoot the other foot trying to handle files this large, and use this business requirement to justify the upgrade. |
|
Back to top |
|
 |
wbi_telecom |
Posted: Thu Nov 01, 2012 3:57 am Post subject: |
|
|
 Disciple
Joined: 15 Feb 2006 Posts: 188 Location: Harrisburg, PA
|
I am using version 7.0 and the largest XML file that you can process using a fileinput node with the setting of whole file is around 2.2 GB. The limit is actually based on a variable that's in the code of fileinput node which stores the file size. We had opened a PMR with IBM about this a year ago and then changed the design of the flow from whole file to record by record.
We have changed our flow to read records and it works fine. We read the XML as a BLOB and use delimiter value as hex value of ending tag of the block that we want to process. It works like a charm and manages the memory and CPU very well.
Cheers, |
|
Back to top |
|
 |
jrsdm |
Posted: Thu Nov 01, 2012 4:26 am Post subject: |
|
|
Apprentice
Joined: 24 Oct 2012 Posts: 27
|
wbi_telecom:
Oh that' good , I am trying your approach for the xml by settting parser as Blob and set the records as element properties as
record Detection :Delimitted
Delimiter:Custom Delimiter(Hex)
Custom delimiter ==?
The custom delimiter in the Hex
Suppose the xml
<?xml version="1.0" encoding="ISO-8859-1"?>
<!-- Edited by XMLSpy® -->
<CATALOG>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
</CATALOG>
<CATALOG2>
<CD>
<TITLE>Empire Burlesque</TITLE>
<ARTIST>Bob Dylan</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>Columbia</COMPANY>
<PRICE>10.90</PRICE>
<YEAR>1985</YEAR>
</CD>
<CD>
<TITLE>Red</TITLE>
<ARTIST>The Communards</ARTIST>
<COUNTRY>UK</COUNTRY>
<COMPANY>London</COMPANY>
<PRICE>7.80</PRICE>
<YEAR>1987</YEAR>
</CD>
<CD>
<TITLE>Unchain my heart</TITLE>
<ARTIST>Joe Cocker</ARTIST>
<COUNTRY>USA</COUNTRY>
<COMPANY>EMI</COMPANY>
<PRICE>8.20</PRICE>
<YEAR>1987</YEAR>
</CD>
</CATALOG2>
The custom deliminate would be what ?
Is there is any document available in internet for the same , so please share it.
mqjeff:
I am working on version 7. I have tried with 2 GB file with the approach mentioned in sample but not work , taking to much of memory. |
|
Back to top |
|
 |
Vitor |
Posted: Thu Nov 01, 2012 5:45 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
wbi_telecom wrote: |
I am using version 7.0 and the largest XML file that you can process using a fileinput node with the setting of whole file is around 2.2 GB. |
This is typically the OS file size limit of most Unix systems without affirmitive action.
wbi_telecom wrote: |
We have changed our flow to read records and it works fine. We read the XML as a BLOB and use delimiter value as hex value of ending tag of the block that we want to process. |
Did your previous code use the techniques of the large file sample? What was wrong with that where IBM advised this rather interesting method.
I've read very large XML documents with WMBv6.1 & WMBv7 using the XMLNSC parser so I'm interested to know why you got advised to use BLOB out of a PMR. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
kimbert |
Posted: Thu Nov 01, 2012 6:09 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
I've done some investigation on this...
- The FileInput node does stream the input and reduce memory usage
- ...but only when the domain is 'XMLNSC' or 'JSON' or 'DFDL' or 'MRM'
Your best bet is:
- Set the domain to 'XMLNSC'
- Set 'Parse Timing' on the FileInput node to 'On demand' ( should be the default setting, but you might have changed it )
- Set 'Record detection' to 'Parsed Record Sequence'
You absolutely *must* use XMLNSC and Parsed Record Sequence to get the behaviour that you require. Why? Because your large file is not a valid XML document. It is a sequence of separate XML documents with some white space between them. The XMLNSC parser contains a specially-designed feature that will automatically skip the white space between the documents ( and any XML declarations and processing instructions that might occur before the next document root tag ). This special parsing mode will only be used if you specify XMLNSC and Parsed Record Sequence.
Give it a try, and let us know how you get on. In theory, your memory usage should be really small - not much larger than it would be if you were parsing individual CATALOG documents.
Hope that helps. Apologies if I misled you with my earlier remarks. |
|
Back to top |
|
 |
Vitor |
Posted: Thu Nov 01, 2012 6:32 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
kimbert wrote: |
Because your large file is not a valid XML document. It is a sequence of separate XML documents with some white space between them. |
Ooo....well spotted!
I hate it when I miss the obvious......  _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
kimbert |
Posted: Thu Nov 01, 2012 6:35 am Post subject: |
|
|
 Jedi Council
Joined: 29 Jul 2003 Posts: 5542 Location: Southampton
|
@Vitor: I expect you were lulled into a false sense of security by the reassuring-looking XML declaration at the start of the file. I was. |
|
Back to top |
|
 |
|