Author |
Message
|
integration |
Posted: Thu Oct 14, 2010 4:38 am Post subject: Handling large file using FileInput Node |
|
|
Apprentice
Joined: 26 Jun 2007 Posts: 32
|
Hello All,
We have encountered a issue when using File Input Node.
Here we are reading a file which may have 40,000 records. In the File Node we are using the option as "Whole File" in the 'Record Detection' properties of the File Input Node.
Here the input is CSV file and we have created a messageset with the complex type which occurs from 1 to -1.
The file looks like below:
Element1|Element2|Element3|Element4<CR><LF>
Element1|Element2|Element3|Element4<CR><LF>
Element1|Element2|Element3|Element4<CR><LF>
Element1|Element2|Element3|Element4<CR><LF>
Now since the message flow is taking lot of time and also cosuming lot of execution group memory we are adviced to take 500 records at a time.
So we are planning to change the complex type in message set from 1 to 500.
Could anyone please suggest how the FileInput node can read each 500 records at a time.
Thanks in Advance.
Your help is very much appreciated. |
|
Back to top |
|
 |
Vitor |
Posted: Thu Oct 14, 2010 4:48 am Post subject: Re: Handling large file using FileInput Node |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
integration wrote: |
In the File Node we are using the option as "Whole File" in the 'Record Detection' properties of the File Input Node. |
Why? The layout you've supplied looks like a set of records delimited with <CR<LF> at first flush. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
mqjeff |
Posted: Thu Oct 14, 2010 5:14 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
As well, if the records are actually fixed length, you can just tell the FileInput node to read out 500 records worth of bytes. |
|
Back to top |
|
 |
integration |
Posted: Fri Oct 15, 2010 6:48 am Post subject: |
|
|
Apprentice
Joined: 26 Jun 2007 Posts: 32
|
Thanks for your reply.
The input file is not of fixed length.
I read in some article that we can use 'Parsed Record Sequence' in Record Detection property. Will that be helpful in this senario? |
|
Back to top |
|
 |
Vitor |
Posted: Fri Oct 15, 2010 7:49 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
integration wrote: |
The input file is not of fixed length. |
We're not talking about the file (or I'm not), I'm talking about the records. If they're not fixed length, why can't you use the <CR><LF> as I suggested?
integration wrote: |
Will that be helpful in this senario? |
It might. Post the link to your "some article". Then explain why you want to use a complex solution when a simple one will apparently suffice.
Or just ignore that as you ignored my previous question & do what you feel is best. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
integration |
Posted: Sun Oct 17, 2010 10:30 pm Post subject: |
|
|
Apprentice
Joined: 26 Jun 2007 Posts: 32
|
Vitor -
Actually our requirement is, we may get 40000 records(not fixed length) in a single file.
Below are sample 2 records which will come on that 40000 record file.
0222|1600000016|000004|P1015||DA|20100210|00000000||USD|2400.00-|USD|2400.00-|USD|2400.00-||0.00 |S0030UVFA6|SHANGHAI 13TH METALLURGYC|20100210|20100210|OTHER|D|K4|2010002||IBMC||||000000000020100210||N|D|US|00000000|STG
0222|1600000017|000001|P9999||DA|20100210|00000000||USD|300.00-|USD|300.00-|USD|300.00-||0.00 |S0030UVFA6|SHANGHAI 13TH METALLURGYC|20100210|20100210||D|K4|2010002||IBMC|055|||000000000020100210||N|D|US|00000000|ZZZ
We are using <CR><LF> as record delimiter in the CSV message set.
Now we want to read 500 records at a time and send it to target. So suppose if we read first 500 records and once that is processed we need to start reading from 501 th record to next 500 records and this process should continue until we finish reading the entire file.
Can you please suggest us how this could be achieved.
Note:
Currently we are using FileInput Node with "Records and Elements" property set to 'Whole File'.
Because of this it is taking lot of CPU time as this messagflow involves calls to DB lookups/External Services. |
|
Back to top |
|
 |
mqjeff |
Posted: Mon Oct 18, 2010 1:41 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
Look, I'm going to say this once.
You've been told it at least twice now, and have failed to actually read and understand it.
Set the "Records and Elements" to something *other* than "Whole File".
Think about the name of this property for a moment - what is it about?
It's about how many RECORDS OR ELEMENTS of the file are READ at a time. |
|
Back to top |
|
 |
Vitor |
Posted: Mon Oct 18, 2010 4:11 am Post subject: |
|
|
 Grand High Poobah
Joined: 11 Nov 2005 Posts: 26093 Location: Texas, USA
|
mqjeff wrote: |
You've been told it at least twice now, and have failed to actually read and understand it. |
integration wrote: |
Currently we are using FileInput Node with "Records and Elements" property set to 'Whole File' |
You've failed to a) explain why you decided to do this & b) explain why you're ignoring advice to change it. Or c) explain why 500 records will not be a problem for you but 501 will. Or 600. Or 1000.
integration wrote: |
Because of this it is taking lot of CPU time as this messagflow involves calls to DB lookups/External Services |
You've also failed to mention this before, or explain what exactly this has to do with reading a file. _________________ Honesty is the best policy.
Insanity is the best defence. |
|
Back to top |
|
 |
Mut1ey |
Posted: Mon Oct 18, 2010 4:32 pm Post subject: File Node |
|
|
Acolyte
Joined: 07 Oct 2005 Posts: 74 Location: England
|
Your problem is you are trying to read the whole file in at once. This tries to parse the whole file into the input tree. This option and Parsed Record Sequence I found to be not suitable for large files of records - in our case - 350k records per file.
Better to use delimited (by cr/lf) |
|
Back to top |
|
 |
integration |
Posted: Tue Oct 19, 2010 2:45 am Post subject: |
|
|
Apprentice
Joined: 26 Jun 2007 Posts: 32
|
Thanks a lot..Atlast we could progress with delimited (by cr/lf) option. |
|
Back to top |
|
 |
iceage |
Posted: Tue May 17, 2011 10:20 am Post subject: |
|
|
 Acolyte
Joined: 12 Apr 2006 Posts: 68
|
Same requirement as that of original - rational for breaking up into 500 records to improve performance. I have TDS Message with ComplexType = All Elements Delimited , Delimiter = <CR><LF> , MaxOccurs=500 . And a complex type underneath it for all elements delimited by "|".
Currently FileInputNode reads 500 records and then returns the control to EOF terminal , inspite of file having more records .. FileInput records and elements set to "Parsed Record Sequence" .. What is wrong about this ?
Quote: |
Here we are reading a file which may have 40,000 records. In the File Node we are using the option as "Whole File" in the 'Record Detection' properties of the File Input Node.
Here the input is CSV file and we have created a messageset with the complex type which occurs from 1 to -1.
The file looks like below:
Element1|Element2|Element3|Element4<CR><LF>
Element1|Element2|Element3|Element4<CR><LF>
Element1|Element2|Element3|Element4<CR><LF>
Element1|Element2|Element3|Element4<CR><LF>
Now since the message flow is taking lot of time and also cosuming lot of execution group memory we are adviced to take 500 records at a time.
So we are planning to change the complex type in message set from 1 to 500. |
|
|
Back to top |
|
 |
mqjeff |
Posted: Tue May 17, 2011 10:28 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
You have configured your message model to consist of a single structure of no more than 500 records.
You should configure your message model to consist of an unlimited repeating structure of no more than 500 records each. |
|
Back to top |
|
 |
iceage |
Posted: Tue May 17, 2011 11:17 am Post subject: |
|
|
 Acolyte
Joined: 12 Apr 2006 Posts: 68
|
Hmm .. As per this link , it suggests to use a defined boundary - so didn't set a maxoccurs of -1 , rather a fixed value. Regardless , i am unsuccessful , parser reads the whole file - not stopping after 500 records ...!!
Quote: |
If you select an MRM parser, ensure that the message model has a defined message boundary and does not rely on the parse being stopped when it reaches the end of the bitstream. If the final element has a maxOccurs value of -1, the parser continues to read bytes until the end of the bitstream or until it encounters bytes that cause a parsing exception. |
http://publib.boulder.ibm.com/infocenter/wmbhelp/v7r0m0/topic/com.ibm.etools.mft.doc/ac25680_.htm
|
|
Back to top |
|
 |
mqjeff |
Posted: Tue May 17, 2011 11:32 am Post subject: |
|
|
Grand Master
Joined: 25 Jun 2008 Posts: 17447
|
let me make sure I understand.
You want to read the entire contents of the file, and invoke the message flow for each set of at most 500 records in the file, yes?
In that case, you need to create a message definition that matches the entire contents of the file, and contains a repeating record. This record consists of a set of at most 500 delimited records.
You then configure the FileInput node to propagate a single instance of the outer record, not the inner record.
Then you get a logical message tree that has up to five hundred instances of your record in it. |
|
Back to top |
|
 |
iceage |
Posted: Tue May 17, 2011 12:14 pm Post subject: |
|
|
 Acolyte
Joined: 12 Apr 2006 Posts: 68
|
Nope its other way around ... looking to read a subset of records from the file - in batch of 500 records each iteration , until EOF is reached.
As of now , i could get it working reading 1 record at a time (with Records and Elements set to Delimited by <CR><LF>) .. Now trying to read batch of records with message set modeled to match the "batch" and Records and Elements set to "Parser Record Sequence".. |
|
Back to top |
|
 |
|